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PROCESS FOR THE DEVELOPMENT -OF BINDING MINI-PROTEINS 
BACKGROUND OF THE INVENTION 

Fislfl pf the Xnveatipn 
)f 5 This invention relates to development of novel binding 

mini -proteins, and especially micro -proteins, by an 
iterative process of mutagenesis, expression, affinity 
selection, and amplification. In this process, a gene 
encoding a mini -protein potential binding domain, said gene 
10 being obtained by random mutagenesis of a limited number of 
predetermined codons, is fused to a genetic element which 
causes the resulting chimeric expression product to be 
displayed on the outer surface of a virus (especially a 
filamentous phage) or a cell. Affinity selection is then 
15 used to identify viruses or cells whose genome includes 
such a fused gene which coded for the protein which bound to 
the chromatographic target. 
Description of the Related Art 

The amino acid sequence of a protein determines its 
20 three-dimensional (3D) structure, which in turn determines 
protein function* Some residues on the polypeptide chain 
are more important than others in determining the 3D 
structure of a protein, and hence its ability to bind, non- 
covalently, but very tightly and specifically, to 
25 characteristic target molecules. 

n Protein engineering" is the art of manipulating the 
sequence of a protein in order, e.g. . to alter its binding 
characteristics . The factors affecting protein binding are 
known, but designing new complementary surfaces has proved 
30 difficult. Quiocho st (QUI087) suggest it is unlikely 
that, using current protein engineering methods, proteins 
can be constructed with binding properties superior to those 
of proteins that occur naturally. 

Nonetheless, there have been some isolated successes. 
35 For example, Wilkinson ££ al. (WILK84) reported that a 
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mutant of the tyrosyl tRNA synthetase of Bacillus 
stearothermophilus with the mutation Thr 51 -->Pro exhibits a 
100 -fold increase in affinity for ATP. 

With the development of recombinant DNA techniques, it 
5 became possible to obtain a mutant protein by mutating the 
gene encoding the native protein and then expressing the mu- 
tated gene- Several mutagenesis strategies are known. One, 
"protein surgery" (DILL87) , involves the introduction of one 
or more predetermined mutations within the gene of choice. 
10 A single polypeptide of completely predetermined sequence is 
expressed, and its binding characteristics are evaluated. 

At the other extreme is random mutagenesis by means of 
relatively nonspecific mutagens such as radiation and 
various chemical agents. See Ho £fc al. (HOCJ85) and 
15 Lehtovaara, BP Appln. 285,123. 

It is possible to randomly vary predetermined nucleo- 
tides using a mixture of bases in the appropriate cycles of 
a nucleic acid synthesis procedure. (OLIP86, OLIP87) The 
proportion of bases in the mixture, for each position of a 
20 codon, will determine the frequency at which each amino acid 
will occur in the polypeptides expressed from the degenerate 
DNA population. (REID88a; VERS 8 6a; VERS86b) . The problem 
of unequal abundance of DNA encoding different amino acids 
is not discussed. 
25 Ferenci and collaborators have published a series of 

papers on the chromatographic isolation of mutants of the 
maltose -transport protein LamB of E. coli (FERE82a, PERE82b, 
FERE83, FERE84, CLUN84, HEIN87 and papers cited therein). 
The mutants were either spontaneous or induced with nonspe- 
30 cific chemical mutagens. Levels of mutagenesis were picked 
to provide single point mutations or single insertions of 
two residues. No multiple mutations were sought or found. 

While variation was seen in the degree of affinity for 
the conventional LamB stabs trates maltose and starch, there 
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was no selection for affinity to a target molecule not bound 
at all by native LaxnB, and no multiple mutations were sought 
or found. FERE84 speculated that the affinity chromato- 
graphic selection technique could be adapted to development 
5 of similar mutants of other "important bacterial surface- 
located enzymes", and to selecting for mutations which 
result in the relocation of an intracellular bacterial 
protein to the cell surface. Ferenci's mutant surface 
proteins would not, however, have been chi m eras of a 

10 bacterial surface protein and an exogenous or heterologous 
binding domain. 

Ferenci also taught that there was no need to clone the 
structural gene, or to know the protein structure, active 
site, or sequence. The method of the present invention, 

15 however, specifically utilizes a cloned structural gene. It 
is not possible to construct and express a chimeric, outer 
surface -directed potential binding protein- encoding gene 
without cloning. 

Ferenci did not limit the mutations to particular loci 

20 Substitutions were limited by the nature of the mutagen 
rather than by the desirability of particular amino acid 
types at a particular site. In the present invention, 
knowledge of the protein structure, active site and/ or 
sequence is used as appropriate to predict which residues 

25 are most likely to affect binding activity without unduly 
destabilizing the protein, and the mutagenesis is focused 
upon those sites. Ferenci does not suggest that surface 
residues should be preferentially varied. In consequence, 
Ferenci's selection system is much less efficient than that 

30 disclosed herein. 

A number of researchers have directed ypimrtfltfrfl foreign 
antigenic epitopes to the surface of bacteria or phage, 
fused to a native bacterial or phage surface protein/ and 
demonstrated that the epitopes were recognized by antibod- 

35 ies. Thus, Charbit, et al. <CHAR86a,b) genetically inserted 
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the C3 epitope of the VP1 coat protein of poliovirus into 
the LamB outer membrane protein of Su. coli . and determined 
immunologically that the C3 epitope was exposed on the 
bacterial cell surface. Charbit, et al. (CHAR87) likewise 
5 produced chimeras of LamB and the A (or B) epitopes of the 
preS2 region of hepatitis B virus. 

A chimeric LacZ/OmpB protein has been expressed in ^ 
£<2li and is, depending on the fusion, directed to either the 
outer membrane or the periplasm (SILH77}. A chimeric 
10 LacZ/OmpA surface protein has also been expressed and 
displayed on the surface of coli cells (WEIN83) . Others 
have expressed and displayed on the surface of a cell 
chimeras of other bacterial surface proteins, such as 
£fili type 1 fimbriae (HEDE89) and Bacterioid^ nodusus type 
15 i fimbriae (JENN89) . m none of the recited cases was the 
inserted genetic material mutagenized. 

Dulbecco (D0LB86) suggests a procedure for incor- 
porating a foreign antigenic epitope into a viral surface 
protein so that the expressed chimeric protein is displayed 
20 on the surface of the virus in a manner such that the 
foreign epitope is accessible to antibody. In 1985 Smith 
(SMIT85) reported inserting a nonfunctional segment of the 
fisoRI endonuclease gene into gene HI of bacteriophage fi, 
"in phase". The gene III protein is a minor coat protein 
25 necessary for infectivity. Smith demonstrated that the 
recombinant phage were adsorbed by immobilized antibody 
raised against the EsqRI endonuclease, and could be eluted 
with acid. De la Cruz s£ aj^ (DELA88) have expressed a 
fragment of the repeat region of the circumsporozoite 
protein from Plasmodium falciparum on the surface of M13 as 
an insert in the gene III protein. They showed that the 
recombinant phage were both antigenic and immunogenic in 
rabbits, and that such recombinant phage could be used for 
B epitope mapping. The researchers suggest that similar 



30 



WO 92/15677 



PCT/US92/01456 



5 

recombinant phage could be used for T epitope mapping and 
for vaccine development. 

None of these researchers suggested mutagenesis of the 
inserted material, nor is the inserted material a complete 
5 binding domain conferring on the chimeric protein the 
ability to bind specifically to a receptor other than the 
antigen combining site of an antibody. 

McCaf f erty fit al. (MCCA90) expressed a fusion of an Fv 
fragment of an antibody to the N- terminal of the pill 

10 protein. The Fv fragment was not mutated. 

Parmley and Smith (PARM88) suggested that an epitope 
library that exhibits all possible hexapeptides could be 
constructed and used to isolate epitopes that bind to 
antibodies. In discussing the epitope library, the authors 

15 did not suggest that it was desirable to balance the 
representation of different amino acids. Nor did they teach 
that the insert should encode a complete domain of the 
exogenous protein. Epitopes are considered to be unstruc- 
tured peptides as opposed to structured proteins* 

20 Scott and Smith (SCOT90) and Cwirla fit aL*. (CWIR90) 

prepared "epitope libraries' 1 in which potential hexapeptide 
epitopes for a target antibody were randomly mutated by 
fusing degenerate oligonucleotides, encoding the epitopes, 
-with gene III of fd phage, and expressing the fused gene in 

25 phage -infected cells. The cells manufactured fusion phage 
which displayed the epitopes on their surface; the phage 
which bound to immobilized antibody were eluted with acid 
and studied. In both cases, the fused gene featured a 
segment encoding a spacer region to separate the variable 

30 region from the wild type pill sequence so that the varied 
amino acids would not be constrained by the nearby pill 
sequence. Devlin st al. (DEVL90) similarly screened, using 
M13 phage, for random 15 residue epitopes recognized by 
streptavidin. Again, a spacer was used to move the random 

35 peptides away from the rest of the chimeric phage protein. 
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These references therefore taught away from constraining the 
conformational repertoire of the imitated residues. 

toother problem with the Scott and Smith, Cwirla fit 
al.. . and Devlin g£ al. . libraries was that they provided a 
5 highly biased sampling of the possible amino acids at each 
position. Their primary concern in designing the degenerate 
oligonucleotide encoding their variable region was to ensure 
that all twenty amino acids were encodible at each position; 
a secondary consideration was minimizing the frequency of 
occurrence of stop signals. Consequently, Scott and Smith 
and Cwirla £fc al. employed NNK (N»equal mixture of G, A, T, 
C; K-equal mixture of G and T) while Devlin ££ al. used NNS 
(S=equal mixture of 6 and C) . There was no attempt to 
minimize the frequency ratio of most favored- to- least 
favored amino acid, or to equalize the rate of occurrence of 
acidic and basic amino acids. 

Devlin st al. characterized several affinity- selected 
streptavidin-binding peptides, but did not measure the 
affinity constants for these peptides. Cwirla £fc al. did 
determine the affinity constant for his peptides, but were 
disappointed to find that his best hexapeptides had affini- 
ties (350-300nM>, "orders of magnitude" weaker than that of 
the native Met- enkephalin epitope (7nM) recognized by the 
target antibody. Cwirla £fc al. speculated that phage 
bearing peptides with higher affinities remained bound under 
acidic elution, possibly because of multivalent interactions 
between phage (carrying about 4 copies of pill) and the 
divalent target IgG. Scott and Smith were able to find 
peptides whose affinity for the target antibody (A2) was 
comparable to that of the reference myohemerythrin epitope 
(50riM) . However, Scott and Smith likewise expressed concern 
that some high- affinity peptides were lost, possibly through 
irreversible binding of fusion phage to target. 

Lam, et al. (LAM91) created a pentapeptide library by 
nonbiological synthesis on solid supports . While they teach 
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that it is desirable to obtain the universe of possible 
random pentapeptides in roughly equimolar proportions , they 
deliberately excluded cysteine, to eliminate any possibility 
of disulfide crosslinking. 
5 Ladner, Glick, and Bird, WO88/06630 (publ. 7 Sept. 1988 

and having priority from US application 07/021,046, assigned 
to Genex Corp.) (LGB) speculate that diverse single chain 
antibody domains (SCAD) may be screened for binding to a 
particular antigen by varying the DNA encoding the combining 

10 determining regions of a single chain antibody, subcloning 
the SCAD gene into the gpV gene of phage X so that a 
SCAD/gpV chimera is displayed on the outer surface of phage 
X, and selecting phage which bind to the antigen through 
affinity chromatography. The only antigen mentioned is 

15 bovine growth hormone. No other binding molecules, targets, 
carrier organisms, or outer surface proteins are discussed. 
Nor is there any mention of the method or degree of 
mutagenesis. Furthermore, there is no teaching as to the 
exact structure of the fusion nor of how to identify a 

20 successful fusion or how to proceed if the SCAD is not 
displayed. 

Ladner and Bird, WO88/06601 (publ. 7 September 1988) 
suggest that single chain "pseudodimeric 8 repressors (DNA- 
binding proteins) may be prepared by mutating a putative 
25 linker peptide followed by in vivo selection that mutation 
and selection may be used to create a dictionary of recogni- 
tion elements for use in the design of asymmetric repres- 
sors. The repressors are not displayed on the outer surface 
of an organism. 

30 Methods of identifying residues in protein which can be 

replaced with a cysteine in order to promote the formation 
of a protein- stabilizing disulfide bond are given in 
Pantoliano and Ladner, U.S. Patent No. 4,903,773 (PANT90) , 
Pantoliano and Ladner (PANT87) , Pabo and Suchenek (PAB086) , 

35 MATS 89, and SATJE86. 
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Ladner, al, . W09 0/02809 describes semirandom 

mutagenesis ("variegation") of known proteins displayed as 
domains of semiartif icial outer surface proteins of 
bacteria, phage or spores, and affinity selection of mutants 
5 having desired binding characteristics. The smallest 
proteins specif ically, mentioned in W090/02809 are crambin 
(3:40, 4:32, 16:26 disulfides; 46 AAs) , the third domain of 
ovomucoid (8:38, 16:35 and 24:56 disulfides; 56 AAs), and 
BPTI (5:55, 14:38, 30:51 disulfides; 58 AAs) . W090/02809 
10 also specifically describes a strategy for "variegating" a 
codon to obtain a mix of all twenty amino acids at that 
position in approximately equal proportions. 

Bass, et ax. (hass^O) fused human growth hormone to the 
gene III protein of M13 phage. He suggested that hGH and 
15 other "large proteins" might be mutated and "binding 
selections" applied. 

SUMMARY OF THE INVENTION 
A polypeptide is a polymer composed of a single chain 
of the same or different amino acids joined by peptide 
20 bonds. Linear peptides can take up a very large number of 
different conformations through internal rotations about the 
main chain single bonds of each a carbon. These rotations 
are hindered to varying degrees by side groups, with glycine 
interfering the least, and valine, isoleucine and, especial - 
25 ly f proline, the most. A polypeptide of 20 residues may 
have 10 20 different conformations which it may assume by 
various internal rotations. 

Proteins are polypeptides which, as a result of 
stabilizing interactions between amino acids that are not 
30 necessarily in adjacent positions in the chain, have folded 
into a well-defined conformation. This folding is usually 
essential to their biological activity. 

For polypeptides of 40-60 residues or longer, 
noncovalent forces such as hydrogen bonds, salt bridges, and 
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hydrophobic interactions are sufficient to stabilize a 
particular folding or conformation. The polypeptide's 
constituent segments are held to more or less that conforma- 
tion unless it is perturbed by a denaturant such as high 
5 temperature, or low or high pH, whereupon the polypeptide 
unfolds or "melts". The smaller the peptide, the more 
likely it is that its conformation will be determined by the 
environment. If a small unconstrained peptide has biologi- 
cal activity, the peptide ligand will be in essence a random 
10 coil until it comes into proximity with its receptor. The 
receptor accepts the peptide only in one or a few conforma- 
tions because alternative conformations are disfavored by 
unfavorable van der Waals and other non-covalent interac- 
tions . 

15 Small polypeptides have potential advantages over 

larger polypeptides when used as therapeutic or diagnostic 
agents, including (but not limited to) : 

a) better penetration into tissues, 

b) faster elimination from the circulation (important for 
20 imaging agents) , 

c) lower antigenicity, and 

d) higher activity per mass. 

Moreover, polypeptides, especially those of less than 
about 40 residues, have the advantage of accessibility vig , 
25 chemical synthesis; polypeptides of under about 30 residues 
are particularly preferred. Thus, it would be desirable to 
be able to employ the combination of imitation and affinity 
selection to identify small polypeptides which bind a target 
of choice. 

30 Most polypeptides of this size, however, have disadvan- 

tages as binding molecules. According to Olivera ££ al. 
(OLIV90a) : "Peptides in this size range normally equilibrate 
among many conformations (in order to have a fixed 
conformation, proteins generally have to be much larger) . " 

35 Specific binding of a peptide to a target molecule requires 
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the peptide to take up one conformation that is 
complementary to the binding site. For a decapeptide with 
three isoenergetic conformations ( e.g. . /? strand, a helix, 
and reverse turn) at each residue, there are about 6.-10 4 
5 possible overall conformations. Assuming these conforma- 
tions to be equi -probable for the unconstrained decapeptide, 
if only one of the possible conformations bound to the 
binding site, then the affinity of the peptide for the 
target would be expected to be about 6-10 4 higher if it 
10 could be constrained to that single effective conformation. 
Thus, the unconstrained decapeptide, relative to a 
decapeptide constrained to the correct conformation, would 
evnqpgg-HQ^ sxhibit lo^er affinity. Xt ^ould also 
eachibit lower specificity, since one of the other confor- 
15 nations of the unconstrained decapeptide might be one which 
bound tightly to a material other than the intended target. 
By way of corollary, it could have less resistance to 
degradation by proteases, since it would be more likely to 
provide a binding site for the protease. 
20 The present invention overcomes these problems, while 

retaining the advantages of smaller polypeptides, by 
identifying novel -p roteins having the desired binding 
characteristics. Mini- Proteins are small polypeptides 
which, while too small to have a stable conformation as a 
25 result of noncovalent forces alone, are covalently 
crosslinked ( e.g. . by disulfide bonds) into a stable 
conformation and hence have biological activities more 
typical of larger protein molecules than of unconstrained 
polypeptides of comparable size. THe mini-proteins with 
which the present invention is particularly concerned fall 
into two categories: (a) disulf ide-bonded micro-proteins of 
less than 40 amino acids; and (b) metal ion- coordinated 
mini-proteins of less than 60 amino acids. 
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The present invention relates to the construction, 
expression, and selection of imitated genes that specify 
novel mini -proteins with desirable binding properties, as 
well as these mini -proteins themselves, and the "libraries" 
5 of mutant "genetic packages" used to display the mini- 
proteins to a potential "target" material. The "targets" 
may be, but need not be, proteins* Targets may include 
other biological or synthetic macromolecules as well as 
other organic and inorganic substances. 

10 The prior application, W090/02809 generally teaches 

that stable protein domains may be mutated in order to 
identify new proteins with desirable binding 
characteristics. Among the suitable "parental" proteins 
which it specifically identifies as useful for this purpose 

15 are three proteins --BPTI (58 residues), the third domain of 
ovomucoid (56 residues), and crambin (46 residues) --which 
are in the size range of 40-60 residues wherein noncovalent 
interactions between nonadjacent amino acids become 
significant; all three also contain three disulfide bonds 

20 that enhance the stability of the molecule. 

-Nowhere in W090/02809 does one find any specific 
recognition that a polypeptide with less than 40 residues, 
and especially those with only one or two disulfide bonds, 
would have sufficient stability to serve as a "scaffolding" 

25 for mutational variation. These "micro -proteins" are, 
nonetheless, of great utility, as previously indicated. 

WO90/02809 also suggests the use of a protein, azurin, 
having a different form of crosslink (Cu:CYS,HIS,HIS,MET) . 
However, azurin has 128 amino acids, so it cannot possibly 

30 be considered a mini -protein. The present invention 
relates to the use of mini -proteins of less than 60 amino 
acids which feature a metal ion- coordinated crosslink. 

By virtue of the present invention, proteins are 
obtained which can bind specifically to targets other than 

35 the antigen- combining sites of antibodies. A protein is not 
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to be considered a "binding protein" merely because it can 
be bound by an antibody (see definition of "binding protein" 
which follows) . While almost any amino acid sequence of 
more than about 6-8 amino acids is likely, when linked to an 
5 immunogenic carrier, to elicit an immune response, any given 
random polypeptide is unlikely to satisfy the stringent 
definition of "binding protein" with respect to minimum 
affinity and specificity for its substrate. It is only by 
testing numerous random polypeptides simultaneously (and, in 
10 the usual case, controlling the extent and character of the 
sequence variation, i.e. . limiting it to residues of a 
potential binding domain having a stable structure, the 

• -*T 1 « 1 J — -CC_ UJ _ j£ -! _ .~ *-V> «r% 

stability) that this obstacle is overcome. 
15 The appended claims are hereby incorporated by refer- 

ence into this specification as an enumeration of the 
preferred embodiments. 

BRIEF DESCRIPTION OP THE DRAWINGS 

20 Figure 1 shows the main chain of scorpion toxin (Brookhaven 
Protein Data Bank entry 1SN3) residues 20 through 42. 
CYSjj and CYS 4I are shown forming a disulfide. In the 
native protein these groups form disulfides to other 
cysteines, but no main- chain motion is required to 

25 bring the gamma sulphurs into acceptable geometry. 

Residues, other than GLY, are labeled at the j8 carbon 
with the one -letter code. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

30 I • INTRODUCTION 

The fundamental principle of the invention is one of 
forced evolution . In nature, evolution results from the 
combination of genetic variation, selection for advantageous 
traits , and reproduction of the selected individuals , 
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thereby enriching the population for the trait . The present 
invention achieves genetic variation through controlled 
random mutagenesis (" variegation ") of DNA, yielding a 
mixture of DNA molecules encoding different but related 
5 potential binding domains that are mutants of micro- 
proteins. It selects for mutated genes that specify novel 
proteins with desirable binding properties by 1) arranging 
that the product of each mutated gene be displayed on the 
outer surface of a replicable genetic package (GP) (a cell, 

10 spore or virus ) that contains the gene , and 2 ) using 
flfifjnlty paction — selection for binding to the target 
material --to enrich the population of packages for those 
packages containing genes specifying proteins with improved 
binding to that target material. Finally, enrichment is 

15 achieved by allowing only the genetic packages which, by 
virtue of the displayed protein, bound to the target, to 
reproduce. The evolution is "forced" in that selection is 
for the target material provided and in that particular 
codons are mutagenized at higher- than- natural frequencies* 

20 The display strategy is first perfected by modifying a 

genetic package to display a stable, structured domain (the 
"initial potential binding domain". IPBD) for which an 
affinity molecule (which may be an antibody) is obtainable. 
The success of the modifications is readily measured by, 

25 e.g. . determining whether the modified genetic package binds 
to the affinity molecule. 

The IPBD is chosen with a view to its tolerance for 
extensive mutagenesis. Once it is known that the IPBD can 
be displayed on a surface of a package and subjected to 

30 affinity selection, the gene encoding the IPBD is subjected 
to a special pattern of multiple mutagenesis, here termed 
" variegation " . which after appropriate cloning and amplifi- 
cation steps leads to the production of a population of 
genetic packages each of which displays a single potential 

35 binding domain (a mutant of the IPBD) , but which collective- 
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ly display a multitude of different though structurally 
related potential binding domains (PBDs>. Bach genetic 
package carries the version of the pM gene that encodes the 
PBD displayed on the surface of that particular package. 
5 Affinity selection is then used to identify the genetic 
packages bearing the PBDs with the desired binding charac- 
teristics, and these genetic packages may then be amplified. 
After one or more cycles of enrichment by affinity selection 
and amplification, the DNA encoding the successful binding 

10 domains (SBDs) may then be recovered from selected packages. 

If need be, the DNA from the SBD-bearing packages may 

then be rurtner "variegaueu- , uoj.ua ««« w» 

of variegation as the "parental potential binding domain" 
(PPBD) to the next generation of PBDs, and the process 

15 continued until the worker in the art is satisfied with the 
result. Because of the structural and evolutionary 
relationship between the IPBD and the first generation of 
PBDs, the IPBD is also considered a "parental potential 
binding domain" (PPBD) . 

20 When micro-proteins are variegated, the residues which 

are covalently crosslinked in the parental molecule are left 
unchanged, thereby stabilizing the conformation. For 
example, in the variegation of a disulfide bonded micro- 
protein, certain cysteines are invariant so that under the 

25 conditions of expression and display, covalent crosslinks 
f e.a. . disulfide bonds between one or more pairs of 
cysteines) form, and substantially constrain the conforma- 
tion which may be adopted by the hypervariable linearly 
intermediate amino acxas. xu ouiea. ^v*d, «■ ^ 

30 scaffolding is engineered into polypeptides which are 
otherwise extensively randomized. 

Once a micro-protein of desired binding characteristics 
is characterized, it may be produced, not only by 
recombinant DNA techniques, but also by nonbiological 

35 synthetic methods. 
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For the purposes of the appended claims, a protein P is 
a "binding protein" if for at least one molecular, ionic or 
atomic species A, other than the variable domain of an 
antibody, the dissociation constant K D (P,A) < 10" 6 
5 moles/liter (preferably, < 10" 7 moles/liter) . 

The exclusion of "variable domain of an antibody" in 
(l) above is intended to make clear that for the purposes 
herein a protein is not to be considered a "binding protein" 
merely because it is antigenic. 

10 Most larger proteins fold into distinguishable globules 

called domains (R0SS81) . Protein domains have been defined 
various ways; definitions of "domain" which emphasize 
stability retention of the overall structure in the face 
of perturbing forces such as elevated temperatures or 

15 chaotropic agents are favored, though atomic coordinates 
and protein sequence homology are not completely ignored. 

When a domain of a protein is primarily responsible for 
the protein's ability to specifically bind a chosen target, 
it is referred to herein as a "binding domain" (BD) . 

20 The term "variegated DNA" (vgDNA) refers to a mixture 

of DNA molecules of the same or similar length which, when 
aligned, vary at some codons so as to encode at each such 
codon a plurality of different amino acids, but which encode 
only a single amino acid at other codon positions* It is 

25 further understood that in variegated DNA, the codons which 
are variable, and the range and frequency of occurrence of 
the dif f erent amino acids which a given variable codon 
encodes, are determined in advance by the synthesizer of the 
DNA, even though the synthetic method does not allow one to 

30 know, a priori, the sequence of any individual DNA molecule 
in the mixture. The number of designated variable codons in 
the variegated DNA is preferably no more than 20 codons, and 
more preferably no more than 5-10 codons. The mix of amino 
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acids encoded at each variable codon may differ from codon 
to codon. 

A population of genetic packages into which variegated 
DNA has been introduced is likewise said to be "variegated". 
5 For the purposes of this invention, the term "potential 

binding protein" (PBP) refers to a protein encoded by one 
species of DNA molecule in a population of variegated DNA 
wherein the region of variation appears in one or more 
subsequences encoding one or more segments of the polypep- 
10 tide having the potential of serving as a binding domain for 
the target substance. 

a "chimeric protein" is a fusion of a first amino acid 
sequence (protein) with a second amino acid sequence 
defining a domain foreign to and not substantially 
homologous with any domain of the first protein. A chimeric 
protein may present a foreign domain which is found (albeit 
in a different protein) in an organism which also expresses 
the first protein, or it may be an "interspecies", 
"intergeneric", etc. fusion of protein structures expressed 
by different kinds of organisms. 

One amino acid sequence of the chimeric proteins of the 
present invention is typically derived from an outer surface 
protein of a "genetic package" (GP) as hereafter defined. 
One which displays a PBD on its surface is a GP (PBD) . The 
25 second amino acid sequence is one which, if expressed alone, 
would have the characteristics of a protein (or a domain 
thereof) but is incorporated into the chimeric protein as a 
recognizable domain thereof. It may appear at the amino or 
carboxy terminal of the first amino acid sequence (with or 
30 without an intervening spacer), or it may interrupt the 
first amino acid sequence. The first amino acid sequence 
may correspond exactly to a surface protein of the genetic 
package, or it may be modified, fL^U, to facilitate the 
display of the binding domain. 



20 



35 
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II. MICRO- AND OTHER MINI -PROTEINS 

In the present invention, disulfide bonded micro- 
proteins and metal- containing mini -proteins are used both 
as IPBDs in verifying a display strategy, and as PPBDs in 
5 actually seeking to obtain a BD with the desired target - 
binding characteristics. Unless otherwise stated or 
required by context, references herein to IPBDs should be 
taken to apply, mutatis mutandis, to PPBDs as well. 

For the purpose of the appended claims, a micro-protein 

10 has between about six and about forty residues; micro- 
proteins are a subset of mini -proteins, which have less than 
about sixty residues. Since micro-proteins form a subset of 
mini -proteins, for convenience the term mini-proteins will 
be used on occasion to refer to both disulfide -bonded micro - 

15 proteins and metal -coordinated mini -proteins. 

The IPBD may be a mini -protein with a known binding 
activity, or one which, while not possessing a known binding 
activity, possesses a secondary or higher structure that 
lends itself to binding activity (clefts, grooves, etc. ) . 

20 When the IPBD does have a known binding activity, it need 
- not have any specific affinity for the target material. The 
IPBD need not be identical in sequence to a naturally- 
occurring mini-protein; it may be a "homologue" with an 
amino acid sequence which * substantially corresponds" to 

25 >that of a known mini-protein, or it may be wholly 
artificial. 

In determining whether sequences should be deemed to 
"substantially correspond", one should consider the 
following issues : the degree of sequence similarity when the 

30 sequences are aligned for best fit according to standard 
algorithms, the similarity in the connectivity patterns of 
any crosslinks ( e.g. . disulfide bonds) , the degree to which 
the proteins have similar three-dimensional structures, as 
indicated by, e.g. . X-ray diffraction analysis or NMR, and 

35 the degree to which the sequenced proteins have similar 
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biological activity. In this context, it should be noted 
that among the serine protease inhibitors, there are 
families of proteins recognized to be homologous in which 
there are pairs of members with as little as 30% sequence 
5 homology. 

A candidate 1PBD should meet the following criteria: 

1) a domain exists that will remain stable under the 
conditions of its intended use (the domain may 
comprise the entire protein that will be inserted, 

0 2*3^ a-conotoxin GI (OLIV90a) , or Offri-III (MCWH89) , 

2) knowledge of the amino acid sequence is obtainable, 
and 

3) a molecule is obtainable having specific and high 
affinity for the IPBD, abbreviated AfM(IPBD) . 

5 If only one species of molecule having affinity for 

IPBD (AfM(IPBD) ) is available, it will be used to: a) detect 
the IPBD on the GP surface, b) optimize expression level and 
density of the affinity molecule on the matrix, and c) 
determine the efficiency and sensitivity of the affinity 

D separation. One would prefer to have available two species 
of AfM(IPBD) , one with high and one with moderate affinity 
for the IPBD. The species with high affinity would be used 
in initial detection and in determining efficiency and 
sensitivity, and the species with moderate affinity would be 

5 used in optimization. 

If the IPBD is not itself a known binding protein, or 
if its native target has not been purified, an antibody 
raised against the IPBD may be used as the affinity 
molecule. Use of an antibody for this purpose should not be 

3 taken to mean that the antibody is the ultimate target - 

There are many candidate IPBDs for which all of the 
above information is available or is reasonably practical to 
obtain, for example, CMTI-III (29 residues) (CMTI-type 
inhibiters are described in OTLE87, FAVE89, WIEC85, MCWH89, 

5 B0DE89, HOLA89a,b) , heat-stable enterotoxin (ST-Ia of 3L_ 
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coli ) (18 residues) (€UAR89, BHATB6, SEKI85, SHIM87, TAKA85, 
TAKE90, THOM85a,b, YOSH85, DALL90, DWAR89 , GARI87, GUZM89, 
GUZM90, H0UG84, KOB089, KUPE90, 0KAM87, OKAM88, AND OKAM90) , 
a-Conotoxin GI (13 residues) (HASH85, ALMQ89) , /z-Conotoxin 
5 GUI (22 residues) (HID090) , and Conus King Kong micro- 
protein (27 residues) (WOOD90) . Structural information can 
be obtained from X-ray or neutron diffraction studies, NMR, 
chemical cross linking or labeling, modeling from known 
structures of related proteins, or from theoretical 
10 calculations. 3D structural information obtained by X-ray 
diffraction, neutron diffraction or NMR is preferred because 
these methods allow localization of almost all of the atoms 
to within defined limits. Table 50 lists several preferred 
IPBDs. 

15 Mutations may reduce the stability of the PBD. Hence 

the chosen IPBD should preferably have a high melting 
temperature, e.g., at least 50°C, and preferably be stable 
over a wide pH range, e.g., 8.0 to 3.0, but more preferably 
11.0 to 2.0, so that the SBDs derived from the chosen IPBD 

20 by mutation and selection- through-binding will retain 
-sufficient stability. Preferably, the substitutions in the 
IPBD yielding the various PBDs do not reduce the melting 
point of the domain below ~40 # C. It will be appreciated 
that mini -proteins contain covalent crosslinks, such as one 

25 or more disulfides, are therefore are likely to be 
sufficiently stable. 

In vitro, disulfide bridges can form spontaneously in 
polypeptides as a result of air oxidation. Matters are more 
complicated in vivo - Very few intracellular proteins have 

30 disulfide bridges, probably because a strong reducing 
environment is maintained by the glutathione system. 
Disulfide bridges are common in proteins that travel or 
operate in intracellular spaces, such as snake venoms and 
other toxins ( e.g. . conotoxins, charybdo toxin, bacterial 
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enterotoxins) , peptide hormones, digestive enzymes, 
complement proteins, immunoglobulins, lysozymes, protease 
inhibitors (BPTI and its homologues, CMTI-III (Cwvirbita 
maxima trypsin inhibitor III) and its homologues, hirudin, 
5 etc. ) and milk proteins. 

Disulfide bonds that close tight intrachain loops have 
been found in pepsin, thioredoxin, insulin A- chain, silk 
fibroin, and lipoamide dehydrogenase. The bridged cysteine 
residues are separated by one to four residues along the 

10 polypeptide chain. Model building, X-ray diffraction 
analysis, and NMR studies have shown that the a carbon path 
of such loops is usually flat and rigid. 

There are two types of disulfide bridges in immuno- 
globulins. One is the conserved intrachain bridge, spanning 

15 about 60 to 70 amino acid residues and found, repeatedly, in 
almost every immunoglobulin domain. Buried deep between the 
opposing fi sheets, these bridges are shielded from solvent 
and ordinarily can be reduced only in the presence of 
denaturing agents. The remaining disulfide bridges are 

20 mainly interchain bonds and are located on the surface of 
the molecule; they are accessible to solvent and relatively 
easily reduced (STEI85) . The disulfide bridges of the 
micro -proteins of the present invention are intrachain 
linkages between cysteines having much smaller chain 

25 spacings . 

When a micro-protein contains a plurality of disulfide 
bonds, it is preferable that at least two cysteines be 
clustered, i.e., are immediately adjacent along the chain (- 
C-C-) or are separated by a single amino acid (-C-X-C-) . In 

30 either case, the two clustered cysteines become unable to 
pair with each other for steric reasons, and the number of 
realizable topologies is reduced. 

An intrachain disulfide bridge connecting amino acids 
3 and 8 of a 16 residue polypeptide will be said herein to 

35 have a span of 4. If amino acids 4 and 12 are also 
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disulfide bonded, then they form a second span of 7. 
Together, the four cysteines divide the polypeptide into 
four intercysteine segments (1-2, 5-7, 9-11, and 13-16) . 
(Note that there is no segment between Cys3 and Cys4.) The 
5 connectivity pattern of a crosslinked micro -protein is a 
simple description of the relative location of the termini 
of the crosslinks. For example, for a micro-protein with 
two disulfide bonds, the connectivity pattern "1-3, 2-4 n 
means that the first crosslinked cysteine is disulfide 

10 bonded to the third crosslinked cysteine (in the primary 
sequence) , and the second to the fourth. 

The degree to which the crosslink constrains the 
conformational freedom of the mini-protein, and the degree 
to which it stabilizes the mini-protein, may be assessed by 

15 a number of means. These include absorption spectroscopy 
(which can reveal whether sua amino acid is buried or 
exposed) , circular dichroism studies (which provides a 
general picture of the helical content of the protein) , 
nuclear magnetic resonance imaging (which reveals the number 

20 of nuclei in a particular chemical environment as well as 
>the mobility of nuclei), and X-ray or neutron diffraction 
analysis of protein crystals. The stability of the mini- 
protein may be ascertained by monitoring the changes in 
absorption at various wavelengths as a function of 

25 temperature, pH, etc. : buried residues become exposed as the 
protein unfolds. Similarly, the unfolding of the mini- 
protein as a result of denaturing conditions results in 
changes in NMR line positions and widths . Circular 
dichroism (CD) spectra are extremely sensitive to conf or- 

30 mat ion. 

The variegated disulfide -bonded micro-proteins of the 
present invention fall into several classes * 

Class I micro-proteins are those featuring a single 
pair of cysteines capable of interacting to form a disulfide 
35 bond, said bond having a span of no more than about nine 
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residues. This disulfide bridge preferably has a span of at 
least two residues; this is a function of the geometry of 
the disulfide bond. When the spacing is two or three resi- 
dues, one residue is preferably glycine in order to reduce 
5 the strain on the bridged residues. The upper limit on 
spacing is less precise, however, in general, the greater 
the spacing, the less the constraint on conformation imposed 
on the linearly intermediate amino acid residues by the 
disulfide bond. 

10 The main chain of such a peptide has very little 

freedom, but is not stressed. The free energy released when 
the disulfide forms exceeds the free energy lost by the 
main- chain when locked into a coofotMiation that brings the 
cysteines together. Having lost the free energy of 
15 disulfide formation, the proximal ends of the side groups 
are held in more or less fixed relation to each other. When 
binding to a target, the domain does not need to expend free 
energy getting into the correct conformation. The domain 
can not jump into some other conformation and bind a non- 
20 target. 

A disulfide bridge with a span of 4 or 5 is especially 
preferred. If the span is increased to 6, the constraining 
influence is reduced. In this case, we prefer that at least 
one of the enclosed residues be an amino acid that imposes 

25 restrictions on the main- chain geometry. Proline imposes 
the most restriction. Valine and isoleucine restrict the 
main chain to a lesser extent. The preferred position for 
this constraining non- cysteine residue is adjacent to one of 
the invariant cysteines, however, it may be one of the other 

30 bridged residues. If the span is seven, we prefer to 
include two amino acids that limit main- chain conformation. 
These amino acids could be at any of the seven positions, 
but are preferably the two bridged residues that are 
immediately adjacent to the cysteines. If the span is eight 
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or nine, additional constraining amino acids may be 
provided. 

While a class I micro-protein may have up to 40 amino 
acids, more preferably it is no more than 20 amino acids. 
5 The disulfide bond of a class I micro-proteins is 

exposed to solvent. Thus, one usually should avoid exposing 
the variegated population of GPs that display class I micro - 
proteins to reagents that rupture disulfides. 

Class II micro-proteins are those featuring a single 
10 disulfide bond having a span of greater than nine amino 
acids. The bridged amino acids form secondary structures 
which help to stabilize their conformation. Preferably, 
these intermediate amino acids form hairpin supersecondary 
structures such as those schematized below: 

15 



- Cys - ochelix- turn- j3s trand- Cys 



- Cys - ahel ix- turn- ahel ix- Cys - 

20 I . — s — S , 

- Cys - 0b trand- turn- 0s trand- Cys - 

Based on studies of known proteins, one may calculate 
the propensity of a particular residue, or of a particular 

25 dipeptide or tripeptide, to be found in an a helix, 0 strand 
or reverse turn. The normalized frequencies of occurrence 
of the amino acid residues in these secondary structures is 
given in Table 6-4 of CREI84. For a more detailed treatment 
on the prediction of secondary structure from the amino acid 

30 sequence, see Chapter 6 of SCHU79 . 

In designing a suitable hairpin structure, one may copy 
an actual structure from a protein whose three-dimensional 
conformation is known, design the structure using frequency 
data, or combine the two approaches. Preferably, one or 

35 more actual structures are used as a model, and the 
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frequency data is used to determine which mutations can be 
. made without disrupting the structure. 

Preferably, no more than three amino acids lie between 
the cysteine and the beginning or end of the a helix or fi 
5 strand. 

More complex structures (such as a double hairpin) are 
also possible* 

Class Ilia micro -proteins are those featuring two 
disulfide bonds . They optionally may also feature secondary 
10 structures such as those discussed above with regard to 
Class II micro-proteins. With two disulfide bonds, there 
are three possible topologies; if desired, the number of 
realizable disulfide bonding topologies may be reduced by 
clustering cysteines as in heat-stable enterotoxin ST-Ia. 
15 Class Illb micro-proteins are those featuring three or 

more disulfide bonds and preferably at least one cluster of 
cysteines as previously described. 

Metal Finger Mini - Proteins . The present invention also 
relates to mini -proteins which are not crosslinked by 
20 disulfide bonds f e.g., analogues of finger proteins. Finger 
proteins are characterized by finger structures in which a 
metal ion is coordinated by two Cys and two His residues, 
forming a tetrahedral arrangement around it. The metal ion 
is most often zinc (II), but may be iron, copper, cobalt, 
25 etc . The "finger" has the consensus sequence (Phe or Tyr) - 
(1 AA) -Cys- (2-4 AAs) -Cys- (3 AAs)-Phe-(5 AAs) -Leu- (2 AAs) - 
His- (3 AAs) -His- (5 AAs) (BERG88; GIBS88) . While finger 
proteins typically contain many repeats of the finger motif, 
it is known that a single finger will fold in the presence 
30 of zinc ions (FRAN87; PARRS 8) . There is some dispute as to 
whether two fingers are necessary for binding to DNA. The 
present invention encompasses mini -proteins with either one 
or two fingers. Other combinations of side groups can lead 
to formation of crosslinks involving multivalent metal ions . 
35 Summers (SUMM91) , for example, reports an 18-amino-acid mini 
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protein found in the capsid protein of HIV-l-Fl and having 
three cysteines and one histidine that bind a zinc atom. It 
is to be understood that the target need not be a nucleic 
acid. 

5 6. Modified PBSs 

There exist a number of enzymes and chemical reagents 
that can selectively modify certain side groups of proteins, 
including: a) protein- tyrosine kinase, Ell m ans reagent, 
methyl transf erases (that methylate GLU side groups) , serine 

10 kinases, proline hydroxyases, vitamin- K dependent enzymes 
that convert GLU to GLA, maleic anhydride, and alkylating 
agents. Treatment of the variegated population of GP(PBD)s 
with one of these enzymes or reagents will modify the side 
groups affected by the chosen enzyme or reagent. Enzymes 

15 and reagents that do not kill the GP are much preferred. 
Such modification of side groups can directly affect the 
binding properties of the displayed PBDs. Using affinity 
separation methods, we enrich for the modified GPs that bind 
the predetermined target. Since the active binding domain 

20 is not entirely genetically specified, we must repeat the 
post-morphogenesis modification at each enrichment round . 
This approach is particularly appropriate with mini -protein 
IPBDs because we envision chemical synthesis of these SBDs. 

25 III. VARIEGATION STRATEGY MUTAGENESIS TO OBTAIN POTENTIAL 
BINDING DOMAINS WITH DESIRED DIVERSITY 

TIT .A. Generally 

When the number of different amino acid sequences 
obtainable by mutation of the domain is large when compared 

30 to the number of different domains which are displayable in 
detectable amounts, the efficiency of the forced evolution 
is greatly enhanced by careful choice of which residues are 
to be varied. First, residues of a known protein which are 
likely to affect its binding activity ( e.g. . surface 

35 residues) and not likely to unduly degrade its stability are 
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identified. Then all or some of the codons encoding these 
residues are varied simultaneously to produce a variegated 
population of DNA. Groups of surface residues that are 
close enough together on the surface to touch one molecule 
5 of target simultaneously are preferred sets for simultaneous 
variegation. The variegated population of DNA is used to 
express a variety of potential binding domains, whose 
ability to bind the target of interest may then be 
evaluated. 

10 The method of the present invention is thus further 

distinguished from other methods in the nature of the highly 
variegated population that is produced and from which novel 
binding proteins are selected. We force the displayed 
potential binding domain to sample the nearby "sequence 

15 space" of related amino- acid sequences in an efficient, 
organized manner. Four goals guide the various variegation 
plans used herein, preferably: 1) a very large number (gug,, 
10 7 ) of variants is available, 2) a very high percentage of 
the possible variants actually appears in detectable 

20 amounts, 3) the frequency of appearance of the desired 
variants is relatively uniform, and 4) variation occurs only 
at a limited number of amino-acid residues, most preferably 
at residues having side groups directed toward a common 
region on the surface of the potential binding domain. 

25 This is to be distinguished from the simple use of 

indiscriminate mutagenic agents such as radiation and 
hydroxylamine to modify a gene, where there is no (or very 
oblique) control over the site of mutation. Many of the 
mutations will affect residues that are not a part of the 

30 binding domain. When chemical mutagens are directed toward 
the whole genome, most mutations occur in genes other than 
the one encoding the potential binding domain. Moreover, 
since at a reasonable level of mutagenesis, any modified 
codon is likely to be characterized by a single base change, 
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only a limited and biased range of possibilities will be 
explored. Equally remote is the use of site- specif ic 
mutagenesis techniques employing mutagenic oligonucleotides 
of nonrandomized sequence, since these techniques do not 
5 lend themselves to the production and testing of a large 
number of variants. While focused random mutagenesis 
techniques are known, the importance of controlling the 
distribution of variation has been largely overlooked. 

The potential binding domains are first designed at the 

10 amino acid level. Once we have identified which residues 
are to be mutagenized, and which mutations to allow at those 
positions, we may then design the variegated DNA which is to 
encode the various PBDs so as to assure that there is a 
reasonable probability that if a PBD has an affinity for the 

15 target, it will be detected. Of course, the number of 
independent transf ormants obtained and the sensitivity of 
the affinity separation technology will impose limits on the 
extent of variegation possible within any single round of 
variegation . 

20 There are many ways to generate diversity in a protein. 

(See RICH86, CARU85, and OLIP86.) At one extreme, we vary 
a few residues of the protein as much as possible ( inter 
alia see CARU85, CARU87, RICH86, and WHAR86) . We will call 
this approach "Focused Mutagenesis". A typical "Focused 

25 Mutagenesis" strategy is to pick a set of five to seven 
residues and vary each through 13-20 possibilities. An 
alternative plan of mutagenesis ("Diffuse Mutagenesis") is 
to vary many more residues through a more limited set of 
choices (See VERS 8 6a and PAKU86) . The variegation pattern 

30 adopted may fall between these extremes, e.g. . two residues 
varied through all twenty amino acids, two more through only 
two possibilities, and a fifth into ten of the twenty amino 
acids . 

There is no fixed limit on the number of codons which 
35 can be mutated simultaneously. However, it is desirable to 
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adopt a mutagenesis strategy which results in a reasonable 
probability that a possible PHD sequence is in fact 
displayed by at least one genetic package. Preferably, the , 
probability that a mutein encoded by the vgDNA and composed 
5 of the least favored amino acids at each variegated position 
will be displayed by at least one independent transf onnant 
in the library is at least 0.50, and more preferably at 
least 0.90. (Muteins composed of more favored amino acids 
would of course be more likely to occur in the same 

10 library.) 

Preferably, the variegation is such as will cause a 
typical transf ormant population to display 10 6 -10 7 different 
amino acid sequences by means of preferably not more than 
10 -fold more (more preferably not more than 3 -fold) 

15 different DNA sequences. 

For a Class X micro-protein that lacks a helices and /? 
strands, one will, in any given round of mutation, 
preferably variegate each of 4-8 non- cysteine codons so that 
they each encode at least eight of the 20 possible amino 

20 acids. The variegation at each codon could be customized to 
that position. Preferably, cysteine is not one of the 
potential substitutions, though it is not excluded. 

When the mini-protein is a metal finger protein, in a 
typical variegation strategy, the two Cys and two His 

25 residues, and optionally also the aforementioned Phe/Tyr, 
Phe and Leu residues, are held invariant and a plurality 
(usually 5-10) of the other residues are varied. 

When the micro-protein is of the type featuring one or 
more a helices and 0 strands, the set of potential amino 

30 acid modifications at any given position is picked to favor 
those which are less likely to disrupt the secondary 
structure at that position. Since the number of possibil- 
ities at each variable amino acid is more limited, the total 
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number of variable amino acids may be greater without 
altering the sampling efficiency of the selection process. 

For class III micro -proteins, preferably not more than 
20 and more preferably 5-10 codons will be variegated. 
5 However, if diffuse mutagenesis is employed, the number of 
codons which are variegated can be higher* 

While variegation normally will involve the substitu- 
tion of one amino acid for another at a designated variable 
codon, it may involve the insertion or deletion of amino 

10 acids as well. 

HI, P. Identification of Residues to be varied 

We now consider the principles that guide our choice of 
residues of the IPBD to vary. A key concept is that only 
structured proteins exhibit specific binding, i.e. can bind 

15 to a particular chemical entity to the exclusion of most 
others. Thus the residues to be varied are chosen with an 
eye to preserving the underlying IPBD structure. 
Substitutions that prevent the PBD from folding will cause 
GPs carrying those genes to bind indiscriminately so that 

20 they can easily be removed from the population. 
" Substitutions of amino acids that are exposed to solvent are 
less likely to affect the 3D structure than are 
substitutions at internal loci. (See PAKU86, REXD88a, 
EISE85, SCHU79, pl69-171 and CREI84, p239-245, 314-315). 

25 Internal residues are frequently conserved and the amino 
acid type cannot be changed to a significantly different 
type without substantial risk that the protein structure 
will be disrupted. Nevertheless, some conservative changes 
of internal residues, such as I to L or F to Y, are 

30 tolerated. Such conservative changes subtly affect the 
placement and dynamics of adjacent protein residues and such 
"fine timing" may be useful once an SBD is found. Inser- 
tions and deletions are more readily tolerated in loops than 
elsewhere. (THOR88) . 
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Data about the IPBD and the target that are useful in 
deciding which residues to vary in the variegation cycle 
include: 1) 3D structure, or at least a list of residues on 
the surface of the IPBD, 2) list of sequences homologous to 
5 IPBD, and 3) model of the target molecule or a stand-in for 
the target. 

III.C. Determining the Substitution Se t for Each Parental 
Residue 

Having picked which residues to vary, we now decide the 

10 range of amino acids to allow at each variable residue. The 
total level of variegation is the product of the number of 
variants at each varied residue. Each varied residue can 
have a different scheme of variegation, producing 2 to 20 
different possibilities. The set of amino acids which are 

15 potentially encoded by a given variegated codon are called 
its "substitution set". 

The computer that controls a DNA synthesizer, such as 
the Milligen 7500, can be programmed to synthesize any base 
of an oligo-nt with any distribution of nts by taking some 

20 nt substrates (e T g. nt phosphoramidites) from each of two or 
more reservoirs. Alternatively, nt substrates can be mixed 
in any ratios and placed in one of the extra reservoir for 
so called "dirty bottle" synthesis; Each codon could be 
programmed differently. The "mix" of bases at each 

25 nucleotide position of the codon determines the relative 
frequency of occurrence of the different amino acids encoded 
by that codon. 

Simply variegated codons are those in which those 
nucleotide positions which are degenerate are obtained from 

30 a mixture of two or more bases mixed in eguimolar propor- 
tions. These mixtures are described in this specification 
by means of the standardized "ambiguous nucleotide" code. 
In this code, for example, in the degenerate codon "SNT n , 
n S n denotes an equimolar mixture of bases 6 and C, B N B , an 
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equimolar mixture of all four bases , and "T n , the single 
invariant base thymidine. 

Complexly variegated codons are those in which at least 
one of the three positions is filled by a base from an other 
5 than equimolar mixture of two of more bases. 

Either simply or complexly variegated codons may be 
used to achieve the desired substitution set. 

If we have no information indicating that a particular 
amino acid or class of amino acid is appropriate, we strive 

10 to substitute all amino acids with equal probability because 
representation of one mini-protein above the detectable 
level is wasteful. Equal amounts of all four nts at each 
position in a codon (NNN) yields the amino acid distribution 
in which each amino acid is present in proportion to the 

15 number of codons that code for it. This distribution has 
the disadvantage of giving two basic residues for every 
acidic residue. In addition, six times as much R, S, and L 
as W or M occur. If five codons are synthesized with this 
distribution, each of the 243 sequences encoding some 

20 combination of L, R, and S are 7776-times more abundant than 
each of the 32 sequences encoding some combination of W and 
M. To have five Ws present at detectable levels, we must 
have each of the (L,R,S) sequences present in 7776 -fold 
excess • 

25 Particular amino acid residues can influence the 

tertiary structure of a defined polypeptide in several ways, 
including by: 

a) affecting the flexibility of the polypeptide main 
chain, 

30 b) adding hydrophobic groups, 

c) adding charged groups, 

d) allowing hydrogen bonds, and 

e) forming cross -links, such as disulfides, chelation to 
metal ions, or bonding to prosthetic groups. 
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Lundeen (LUND86) has tabulated the frequencies of amino 
acids in helices, P strands, turns, and coil in proteins of 
known 3D structure and has distinguished between CYSs having 
free thiol groups and half cystines. He reports that free 
5 CYS is found most often in helixes while half cystines are 
found more often in £ sheets. Half cystines are, however, 
regularly found in helices. Pease s£ alj. (PEAS90) 
constructed a peptide having two cystines; one end of each 
is in a very stable a helix. Apamin has a similar structure 
10 (VJEMM83, PEAS88). 
Flexibility; 

GLY is the smallest amino acid, having two hydrogens 
attached to the C„. Because GLY has no C,, it confers the 
most flexibility on the main chain. Thus GLY occurs very 

15 frequently in reverse turns, particularly in conjunction 
with PRO, ASP, ASN, SER, and THR. 

The amino acids ALA, SER, CYS, ASP, ASN, LEU, MET, PHE, 
TYR, TRP, ARG, HIS, GLU, GLN, and LYS have unbranched 0 
carbons. Of these, the side groups of SER, ASP, and ASN 

20 frequently make hydrogen bonds to the main chain and so can 
take on main-chain conformations that are energetically 
unfavorable for the others. VAL, ILE, and THR have branched 
0 carbons which makes the extended main- chain conformation 
more favorable. Thus VAL and ILE are most often seen in 0 

25 sheets. Because the side group of THR can easily form 
hydrogen bonds to the main chain, it has less tendency to 
exist in a 0 sheet. 

The main chain of proline is particularly constrained 
by the cyclic side group. The 0 angle is always close to - 

30 60°. Most prolines are found near the surface of the 
protein. 
Charge: 

LYS and ARG carry a single positive charge at any pH 
below 10.4 or 12.0, respectively. Nevertheless, the 
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methylene groups, four and three respectively, of these 
amino acids are capable of hydrophobic interactions.' The 
guanidinium group of ARG is capable of donating five 
hydrogens simultaneously, while the amino group of LYS can 

5 donate only three. Furthermore, the geometries of these 
groups is quite different, so that these groups are often 
not interchangeable. 

ASP and GLU carry a single negative charge at any pH 
above ~4.5 and 4.6, respectively. Because ASP has but one 

0 methylene group, few hydrophobic interactions are possible. 
The geometry of ASP lends itBelf to forming hydrogen bonds 
to main- chain nitrogens which is consistent with ASP being 
found very often in reverse turns and at the beginning of 
helices. GLU is more often found in a helices and 

5 particularly in the amino - terminal portion of these helices 
because the negative charge of the side group has a 
stabilizing interaction with the helix dipole (NICH88, 
SALI88) . 

HIS has an ionization pK in the physiological range, 
0 viz. 6.2. This pK can be altered by the proximity of 
charged groups or of hydrogen donators or acceptors. HIS is 
capable of forming bonds to metal ions such as zinc, copper, 
and iron. 

5 Aside from the charged amino acids, SER, THR, ASN, GLN, 

TYR, and TRP can participate in hydrogen bonds. 
Cress 34rits; 

The most important form of cross link is the disulf ide 
bond formed between the thiols of CYS residues. In a 

0 suitably oxidizing environment , these bonds form 
spontaneously. These bonds can greatly stabilize a 
particular conformation of a protein or mini-protein. When 
a mixture of oxidized and reduced thiol reagents are 
present, exchange reactions take place that allow the most 

5 stable conformation to predominate. Concerning disulfides 
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in proteins and peptides, see also KATZ90, MATS89 , PERR84, 
PERR86, SAUE86, WELL86, JANA89, HORV89, KISH85, and SCHN86. 

Other cross links that form without need of specific 
enzymes include: 
5 1) (CYS) 4 :Fe Rubredoxin (in CREI84, P. 376) 

2) (CYS) 4 :Zn Aspartate Transcarbamylase (in 

CREI84, P. 376) and Zn-fingers 
(HARD90) 

3) (HIS) 2 (MET) (CYS) :Cu Azurin (in CREI84, P. 376) and 
10 Basic "Blue" Cu Cucumber protein 

(6USS88) 

4) (HIS) 4 :Cu CuZn superoxide dismutase 

5) (CYS) 4 : (Fe 4 S 4 ) Ferredoxin (in CREI84, P. 376) 

6) (CYS) 2 (HIS) 2 :Zn Zinc-fingers (GIBS88, SDMM91) 
15 7) (CYS) 3 (HIS) :Zn Zinc-fingers (GAUS87, GIBS88) 

Cross links having (HIS) 2 (MET) (CYS) :Cu has the potential 
advantage that HIS and MET can not form other cross links 
without Cu. 

Simply Variegated Codons 

20 The following simply variegated codons are useful 

because they encode a relatively balanced set of amino 
acids: 

1) SNT which encodes the set [L, P,H,R,V, A,D,G1 : a) one 
acidic (D) and one basic (R) , b) both aliphatic (L,V) 

25 and aromatic hydrophobics (H) , c) large (L,R,H) and 

small (G,A) side groups, d) rigid (P) and flexible (G) 
amino acids, e) each amino acid encoded once* 

2) RNG which encodes the set [M,T,K,R,V,A,E,GJ : a) one 
acidic and two basic (not optimal, but acceptable), b) 

30 hydrophilics and hydrophobics, c) each amino acid 

encoded once. 

3) RMG which encodes the set [T,K,A,E}: a) one acidic, one 
basic, one neutral hydrophilic, b) three favor a 
helices, c) each amino acid encoded once. 
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4) VNT which encodes the set [L,P, H,R,I,T,N,S,V,A,D,G] : 
a) one acidic, one basic, b) all classes: charged, 
neutral hydrophilic, hydrophobic, rigid and flexible, 
etc. . c) each amino acid encoded once. 
5 5) RRS which encodes the set [N,S,K,R,D,E,G a ] : a) two 

acidics, two basics, b) two neutral hydrophilics, c) 
only glycine encoded twice. 

6) NNT which encodes the set [F,S,Y,C,L,P,H,R,I,T,N,V,A- 
,D,G] : a) sixteen DNA sequences provide fifteen dif - 

0 ferent amino acids; only serine is repeated, all others 

are present in equal amounts (This allows very 
efficient sampling of the library, ) , b) there are equal 
numbers of acidic and basic amino acids (D and R, once 
each) , c) all major classes of amino acids are present: 

5 acidic, basic, aliphatic hydrophobic, aromatic 

hydrophobic, and neutral hydrophilic. 

7) NNG, which encodes the set [L* ,R* ,S,W,P,Q,M,T,K,V,A, - 
E,G, stop] : a) fair preponderance of residues that 
favor formation of a- helices [L,M,A,Q,K,E; and, to a 

0 lesser extent, S,R,T] ; b) encodes 13 different amino 

acids. (VHG encodes a subset of the set encoded by NNG 
which encodes 9 amino acids in nine different DNA 
sequences, with equal acids and bases, and 5/9 being a 
helix- favoring . ) 

5 For the initial variegation, NNT is preferred, in most 

cases. However, when the codon is encoding an amino acid to 
be incorporated into an a helix, NNG is preferred. 

Below, we analyze several simple variegations as to the 
efficiency with which the libraries can be sampled. 

0 Libraries of random hexapeptides encoded by (NNK) 6 have 

been reported (SCOT90, CWIR90) . Table 130 shows the 
expected behavior of such libraries. NNK produces single 
codons for PHE, TYR, CYS, TRP, HIS, GLN, ILE, MET, ASN, LYS, 
ASP, GLU (a set); two codons for each of VAL, ALA, PRO, 
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THR, and 6LY (* set) ; and three codons for each of LEU, ARG, 
and SER (0 set). We have separated the 64,000,000 possible 
sequences into 28 classes, shown in Table 130A, based on the 
number of amino acids from each of these sets. The largest 
class is SQaofflKx with -14.6* of the possible sequences. 
Aside from any selection, all the sequences in one class 
have the same probability of being produced. Table 13 0B 
shows the probability that a given MIA sequence taken from 
the (NNK) 6 library will encode a hexapeptide belonging to 
one of the defined classes; note that only -6.3% of DNA 
sequences belong to the QQaaaa class. 

Table 13 0C shows the ejected numbers of sequences in 
each class for libraries containing various xrosibers cf 
independent transformants (yis^ 10 6 , 3-10 6 , 10 7 , 3-10 7 , 10», 
3-10*, 10 5 , and 3-10 9 ). At 10 6 independent transformants 
(ITS), we expect to see 56% of the QQQQQQ class, but only 
0.1% of the CKxaaaa class. The vast majority of sequences 
seen come from classes for which less than 10% of the class 
is sampled. Suppose a peptide from, for example, class 
**QQaa is isolated by fractionating the library for binding 
to a target. Consider how much we know about peptides that 
are related to the isolated sequence. Because only 4% of 
the **QQckx class was sampled, we can not conclude that the 
amino acids from the 0 set are in fact the best from the 0 
set. We might have LEU at position 2, but ARG or SER could 
be better. Even if we isolate a peptide of the ODOOQD 
class, there is a noticeable chance that better members of 
the class were not present in the library. 

With a library of 10 7 ITs, we see that several classes 
have been completely sampled, but that the aaaaaa class is 
only 1.1% sampled. At 7.6-10 7 ITs, we expect display of 50% 
of all amino- acid sequences, but the classes containing 
three or more amino acids of the a set are still poorly 
sampled. To achieve complete sampling of the (NNK) 6 library 
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requires about 3-10 9 XTs, 10-fold larger than the largest 
(NNK) 6 library so far reported. 

Table 131 shows expectations for a library encoded by 
(NNT) 4 (NNG) 2 . The expectations of abundance are independent 
5 of the order of the codons or of interspersed unvaried 
codons. This library encodes 0*133 times as many amino-acid 
sequences, but there are only 0.0165 times as many DNA 
sequences. Thus 5.0-10 7 ITs ( i.e. 60 -fold fewer than 
required for (NNK) 6 ) gives almost complete sampling of the 

10 library. The results would be slightly better for (NNT) 6 
and slightly, but not much, worse for (NNG) 6 . The 
controlling factor is the ratio of DNA sequences to amino- 
acid sequences. 

Table 132 shows the ratio of #DNA sequences/#AA 

15 sequences for codons NNK, NNT, and NNG. For NNK and NNG, we 
have assumed that the PBD is displayed as part of an 
essential gene, such as gene III in Ff phage, as is 
indicated by the phrase "assuming stops vanish" . It is not 
in any way required that such an essential gene be used. If 

20 a non-essential gene is used, the analysis would be slightly 
+ different; sampling of NNK and NNG would be slightly less 
efficient. Note that (NNT) 6 gives 3.6-fold more amino-acid 
sequences than (NNK) 5 but requires 1.7- fold fewer DNA 
sequences. Note also that (NNT) 7 gives £j£iC£ as many amino- 

25 acid sequences as (NNK) 6 , but 3. 3 -fold £SMST DNA sequences. 

Thus, while it is possible to uBe a simple mixture 
(NNS, NNK or NNN) to obtain at a particular position all 
twenty amino acids, these simple mixtures lead to a highly 
biased set of encoded amino acids. This problem can be 

30 overcame by use of complexly variegated codons. 
Complexly Variegated Codons 

The nt distribution ("fxS") within the codon that 
allows all twenty amino acids and that yields the largest 
ratio of abundance of the least favored amino acid (lfaa) to 
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that of the most favored amino acid (mfaa) , subject to the 
constraints of equal abundances of acidic and basic amino 
acids, least possible number of stop codons, and, for 
convenience, the third base being T or G, is shown in Table 
5 10A and yields DNA molecules encoding each type of amino 
acid with the abundances shown. Other complexly variegated 
codons are obtainable by relaxing one or more constraints. 

Note that this chemistry encodes all twenty amino 
acids, with acidic and basic amino acids being equiprobable, 
10 and the most favored amino acid (serine) is encoded only 
2.454 times as often as the least favored amino acid (tryp- 
tophan) . The "fxS" vg codon improves sampling most for 

. • .i . i - ■ — ~ i-cP *-ti0 am-inn acids fF.Y.C»W,H- 

peptiuwss* tivuucLAuuma »<= — — — 

,Q,I,M,N,K,D,E] for which NNK or NNS provide only one codon. 
15 Its sampling advantages are most pronounced when the library 

is relatively small. 

The results of omitting the requirements of equality of 
acids and bases and minimizing stop codons are shown in 
Table 10B. 

20 The advantages of an NUT codon are discussed elsewhere 

in the present application. Unoptimized NNT provides 15 
amino acids encoded by only 16 DNA sequences. It is 
possible to improve on NNT with the distribution shown in 
Table 10C, which gives five amino acids (SER, LEU, HIS, VAL, 

25 ASP) in very nearly equal amounts. A further eight amino 
acids (PHE, TYR, IIiE, ASN, PRO, ALA, ARG, GLY) are present 
at 78% the abundance of SER. THR and CYS remain at half the 
abundance of SER. When variegating DNA for disulfide -bonded 
micro-proteins, it is often desirable to reduce the 

30 prevalence of CYS. This distribution allows 13 amino acids 
to be seen at high level and gives no stops; the optimized 
fxS distribution allows only 11 amino acids at high 
prevalence . 

The NNG codon can also be optimized. Table 10D shows 
35 an approximately optimized ( [ALA] - [ARG] ) NNG codon. There 



WO 92/15677 



PCT/US92/01456 



39 

are, tinder this variegation, four equally most favored amino 
acids: LEU, ARG, ALA, and GLU. Note that there is one 
acidic and one basic amino acid in this set. There are two 
equally least favored amino acids: TRP and MET. The ratio 
5 of Ifaa/mfaa is 0.5258. If this codon is repeated six 
times, peptides composed entirely of TRP and MET are 2% as 
common as peptides coaqoosed entirely of the most favored 
amino acids. We refer to this as "the prevalence of 
(TRP/MET) 6 in optimized NNG 6 vgDNA". 

10 When synthesizing vgDNA by the "dirty bottle" method, 

it is sometimes desirable to use only a limited number of 
mixes. One very useful mixture is called the "optimized NNS 
mixture" in which we average the first two positions of the 
fxS mixture: Tj - 0.24, Cj « 0.17, Aj - 0.33, G x « 0.26, the 

15 second position is identical to the first, C3 « G 3 « 0.5. 
This distribution provides the amino acids ARG, SER, LEU, 
GLY, VAL, THR, ASN, and LYS at greater than 5% plus ALA, 
ASP, GLU, ILE, MET, and TYR at greater than 4%. 

An additional complexly variegated codon is of 

20 interest. This codon is identical to the optimized NNT 
codon at the first two positions and has T:G::90:10 at the 
third position. This codon provides thirteen amino acids 
(ALA, ILE, ARG, SER, ASP, LEU, VAL, PHE, ASN, GLY, PRO, TYR, 
and HIS) at more than 5.5%. THR at 4.3% and CYS at 3.9% are 

25 more common than the LPAAs of NNK (3.125%). The remaining 
five amino acids are present at less than 1%. This codon 
has the feature that all amino acids are present; sequences 
having more than two of the low-abundance amino acids are 
rare. TOien we isolate an SBD using this codon, we can be 

30 reasonably sure that the first 13 amino acids were tested at 
each position. A similar codon, based on optimized NNG, 
could be used. 

Table 10E shows some properties of an unoptimized NNS 
(or NNK) codon. Note that there are three equally most- 
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favored amino acids: ARG, LEU, and SER. There are also 
twelve equally least favored amino acids: PHE, ILE, MET, 
TYR, HIS, GLN, ASN, LYS, ASP, GLU, CYS, and TOP. Five amino 
acids (PRO, THR, ALA, VAL, GLY) fall in between. Note that 
5 a six- fold repetition of NNS gives sequences conposed of the 
amino acids [PHE, ILE, MET, TYR, HIS, GLN, ASN, LYS, ASP, 
GLTJ, CYS, and TOP] at only ~0.1% of the sequences composed 
of [ARG, LEU, and SER] . Not only is this «*20-fold lower 
than the prevalence of (TOP/MET) 6 in optimized NNG 6 vgDNA, 

10 but this low prevalence applies to frwelve amino acids. 
Diffuse Mutagenesis 

Dif fuse Mutagenesis can be applied to any part of the 
protein at any time, but it* most appropriate when some 
binding to the target has been established. Diffuse 

15 Mutagenesis can be accomplished by spiking each of the pure 
nts activated for DNA synthesis ( e.g. nt-phosphoramidites) 
with a small amount of one or more of the other activated 
nts. Preferably, the level of spiking is set so that only 
a small percentage (1% to .00001%, for example) of the final 

20 product will contain the initial DNA sequence. This will 
insure that many single, double, triple, and higher 
mutations occur, but that recovery of the basic sequence 
will be a possible outcome. 

IXI.D. Specia l Considerations Relating to Variegation of 

25 Micro- Proteins with Essential Cysteines 

Several of the preferred simple or complex variegated 
codons encode a set of amino acids which includes cysteine. 
This means that some of the encoded binding domains will 
feature one or more cysteines in addition to the invar iant 

30 disulf ide-bonded cysteines. For example, at each NNT- 
encoded position, there is a one in sixteen chance of 
obtaining cysteine. If six codons are so varied, the 
fraction of domains containing additional cysteines is 0.33. 
Odd numbers of cysteines can lead to complications , see 
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Perry and Wetzel (PERR84) . On the other hand, many 
disulfide- containing proteins contain cysteines that do not 
form disulfides, e.g. trypsin. The possibility of unpaired 
cysteines can be dealt with in several ways: 
5 First, the variegated phage population can be passed 

over an immobilized reagent that strongly binds free thiols, 
such as SulfoLink (catalogue number 44895 H from Pierce 
Chemical Conpany, Rockford, Illinois, 61105) . Another 
product from Pierce is TNB- Thiol Agarose (Catalogue Code 
10 20409 H) . BioRad sells Affi-Gel 401 (catalogue 153-4599) 
for this purpose. 

Second, one can use a variegation that excludes 
cysteines, such as: 

NHT that gives [F,S, Y,L,P,H, I,T,N,V,A,D] , 
15 VNS that gives 

[LSP 2 ,H,Q,RSl,M,T*,N,K,S,V*,A*,E,D,G*] , 
NNG that gives [L a ,S,W,P,Q,R*,M,T,K,R,V,A,E,G,Stop] , 
SNT that gives [L,P,H,R,V,A,D,G] , 
RNG that gives [M,T,K,R,V,A,E,G] , 
20 RMG that gives [T,K,A,E], 

VNT that gives [L, P,H,R,I,T,N, S,V,A,D,G] , or 
RRS that gives [N,S,K,R,D,E,G*] . 
However, each of these schemes has one or more of the 
disadvantages, relative to NNT: a) fewer amino acids are 
25 allowed, b) amino acids are not evenly provided, c) acidic 
and basic amino acids are not equally likely) , or d) stop 
codons occur. Nonetheless, NNG, NHT, and VNT are almost as 
useful as NNT. NNG encodes 13 different amino acids and one 
stop signal. Only two amino acids appear twice in the 16- 
30 fold mix. 

Thirdly, one can enrich the population for binding to 
the preselected target, and evaluate selected sequences post 
hoc for extra cysteines. Those that contain more cysteines 
than the cysteines provided for conformational constraint 
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may be perfectly usable. It is possible that a disulfide 
linkage other than the designed one will occur. This does 
not mean that the binding domain defined by the isolated DNA 
sequence is in any way unsuitable. The suitability of the 
5 isolated domains is best determined by chemical and 
biochemical evaluation of chemically synthesized peptides. 

Lastly, one can block free thiols with reagents, such 
as Ellman's reagent, iodoacetate, or methyl iodide, that 
specifically bind free thiols and that do not react with 

10 disulfides, and then leave the modified phage in the 
population. It is to be understood that the blocking agent 
may alter the binding properties of the micro-protein; thus, 
rrYit- nap » vari Pt-v rvF blocking reaaent in exoectatxon 
that different binding domains will be found. The 

15 variegated population of thiol -blocked genetic packages are 
fractionated for binding. If the DNA sequence of the 
isolated binding micro-protein contains an odd number of 
cysteines, then synthetic means are used to prepare micro- 
proteins having each possible linkage and in which the odd 

20 thiol is appropriately blocked. Nishiuchi (NISH82, NISH86, 
and works cited therein) disclose methods of synthesizing 
peptides that contain a plurality of cysteines so that each 
thiol is protected with a different type of blocking group. 
These groups can be selectively removed so that the 

25 disulfide pairing can be controlled. We envision using such 
a scheme with the alteration that one thiol either remains 
blocked, or is unblocked and then reblocked with a different 
reagent . 

III>E. Planning the Second and Later Rounds of Variegation 
30 The method of the present invention allows efficient 

accumulation of information concerning the amino- acid 
sequence of a binding domain having high affinity for a 
predetermined target. Although one may obtain a highly 
useful binding domain from a single round of variegation and 
35 affinity enrichment, we expect that multiple rounds will be 
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needed to achieve the highest possible affinity and 
specificity. 

If the first round of variegation results in some 
binding to the target, but the affinity for the target is 
5 still too low, further improvement may be achieved by 
variegation of the SBDs. Preferably, the process is 
progressive, i.e. each variegation cycle produces a better 
starting point for the next variegation cycle than the 
previous cycle produced. Setting the level of variegation 

10 such that the ppbd and many sequences related to the ppbd 
sequence are present in detectable amounts ensures that the 
process is progressive. 

If the level of variegation is so high that the ppbd 
sequence is present at such low levels that there is an 

15 appreciable chance that no transformant will display the 
PPBD, then the best SBD of the next round flQllld be mqes£ 
than the PPBD. At excessively high level of variegation, 
each round of mutagenesis is independent of previous rounds 
and there is no assurance of progress ivity. This approach 

20 can lead to valuable binding proteins, but repetition of 
experiments with this level of variegation will not yield 
progressive results. Excessive variation is not preferred. 

Progressivity is not an all-or-nothing property. So 
long as most of the information obtained from previous 

25 variegation cycles is retained and many different surfaces 
that are related to the PPBD surface are produced, the 
process is progressive. 

If the level of variegation in the previous variegation 
cycle was correctly chosen, then the amino acids selected to 

30 be in the residues just varied are the ones best determined. 
The environment of other residues has changed, so that it is 
appropriate to vary them again. Because there are often 
more residues of interest than can be varied simultaneously, 
we may continue by picking residues that either have never 
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been varied (highest priority) or that have not been varied 
for one or more cycles. 

Use of NNT or NNG variegated codons leads to very effi- 
cient sanpling of variegated libraries because the ratio of 
5 (different amino-acid sequences) /(different DNA sequences) 
is much closer to unity than it is for NNK or even the 
optimized vg codon (fxS) . Nevertheless, a few amino acids 
are omitted in each case. Both NNT and NNG allow members of 
all important classes of amino acids: hydrophobic, 

10 hydrophilic, acidic, basic, neutral hydrophilic, small, and 
large. After selecting a binding domain, a subsequent 
variegation and selection may be desirable to achieve a 
higher affinity or specificity. During this second 
variegation, amino acid possibilities overlooked by the 

15 preceding variegation may be investigated. 

A few examples may be helpful. Suppose we obtained PRO 
using NNT. This amino acid is available with either NNT or 
NNG* We can be reasonably sure that PRO is the best amino 
acid from the set [PRO, LEU, VAL, THR, ALA, ARG, GLY, PHE, 

20 TYR, CYS, HIS, ILE, ASN, ASP, SER] . We next might try a set 
that includes [PRO, TRP, GLN, MET, LYS, GLU] . The set 
allowed by NNG is the preferred set. 

What if we obtained HIS instead? Histidine is aromatic 
and fairly hydrophobic and can form hydrogen bonds to and 

25 from the imidazole ring. Tryptophan is hydrophobic and 
aromatic and can donate a hydrogen to a suitable acceptor 
and was excluded by the NNT codon. Methionine was also 
excluded and is hydrophobic. Thus, one preferred course is 
to use the variegated codon HDS that allows [HIS, GLN, ASN, 

30 LYS, TYR, CYS, TRP, ARG, SIR, GLY, <Stop>3 . 

If the first round of variegation is entirely 
unsuccessful, a different pattern of variegation should be 
used. For exairple, if more than one interaction set can be 
defined within a domain, the residues varied in the next 

35 round of variegation should be from a different set than 
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that probed in the initial variegation. If repeated 
failures are encountered, one may switch to a different 
IPBD. 

5 IV. DISPLAY STRATEGY i DISPLAYING* FOREIGN BINDING DOMAINS ON 
THE SURFACE OF A "GENETIC PACKAGE" 

IV. A. General Requirements for Genetic Packages 

In order to obtain the display of a multitude of 
different though related potential binding domains, appli- 

10 cants generate a heterogeneous population of replicable 
genetic packages each of which comprises a hybrid gene 
including a first DNA sequence which encodes a potential 
binding domain for the target of interest and a second DNA 
sequence which encodes a display means, such as an outer 

15 surface protein native to the genetic package but not 
natively associated with the potential binding domain (or 
the parental binding domain to which it is related) which 
causes the genetic package to display the corresponding 
chimeric protein (or a processed form thereof) on its outer 

20 surface. 

The component of a population that exhibits the desired 
binding properties may be quite small, for example, one in 
10 6 or less. Once this component of the population is 
separated from the non-binding canqponents, it must be 

25 possible to amplify it. Culturing viable cells is the most 
powerful amplification of genetic material known and is 
preferred. Genetic messages can also be amplif ied in vitro . 
e.g. by PCR, but this is not the most preferred method. 

Preferably, the GP can be: 1) genetically altered with 

30 reasonable facility to encode a potential binding domain, 2) 
maintained and amplif ied in culture, 3) manipulated to 
display the potential binding protein domain where it can 
interact with the target material during affinity 
separation, and 4) affinity separated while retaining the 
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genetic information encoding the displayed binding domain in 
recoverable form. Preferably, the GP remains viable after 
affinity separation. Preferred CPs are vegetative bacterial 
cells, bacterial spores and, especially, bacterial DMA 
viruses. Bukaryotic cells and eukaryotic viruses may be 
used as genetic packages, but are not preferred. 

When the genetic package is a bacterial cell, or a 
phage which is assembled periplasmically, the display means 
has two components. The first component is a secretion 
signal which directs the initial expression product to the 
inner membrane of the cell (a host cell when the package is 

a phage). This secretion s-icmai -,■«, „t«~ — ^ 

— — ,_>_■. U i.j, jyy d sxgnax 

peptidase to yield a processed, mature, potential binding 
protein. The second component is an outer surface transport 
signal which directs the package to assemble the processed 
protein into its outer surface. Preferably, this outer 
surface transport signal is derived from a surface protein 
native to the genetic package. 

For example, in a preferred embodiment, the hybrid gene 
comprises a DNA encoding a potential binding domain operably 
liaked to a signal sequence (e^, the signal sequences of 
the bacterial pJia& or fcla genes or the signal sequence of 
m3 pha 9T e asselll) and to DNA encoding a coat protein 
(fl^U, the ML3 gene III or gene VIII proteins) of a 
25 filamentous phage (<^, M13) . The expression product is 
transported to the inner membrane (lipid bilayer) of the 
host cell, whereupon the signal peptide is cleaved off to 
leave a processed hybrid protein. The C-terminus of the 
coat protein-like component of this hybrid protein is 
trapped in the lipid bilayer, so that the hybrid protein 
does not escape into the periplasmic space. (This is 
typical of the wild- type coat protein.} As the single- 
stranded DNA of the nascent phage particle passes into the 
periplasmic space, it collects both wild-type coat protein 
and the hybrid protein from the lipid bilayer. The hybrid 
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protein is thus packaged into the surface sheath of the 
filamentous phage, leaving the potential binding domain 
exposed on its outer surface. (Thus, the filamentous phage, 
not the host bacterial cell, is the "replicable genetic 
5 package 11 in this embodiment.) 

If a secretion signal is necessary for the display of 
the potential binding domain, in an especially preferred 
embodiment the bacterial cell in which the hybrid gene is 
expressed is of a "secretion-permissive 11 strain. 

10 When the genetic package is a bacterial spore, or a 

phage (such as *X174 or X) whose coat is assembled 
intracellularly, a secretion signal directing the expression 
product to the inner membrane of the host bacterial cell is 
unnecessary. In these cases, the display means is merely 

15 the outer surface transport signal, typically a derivative 
of a spore or phage coat protein. 

Preferred OSPs for several GPs are given in Table 2. 
References to os p-ipbd fusions in this section should be 
taken to apply, BMtatte BMtanflig, to pgp-pfrfl and pgp-gfrfl 

20 fusions as well. 

Periplasmically assembled phage are preferred when the 
IPBD is a disulfide-bonded micro-protein, as such IPBDs may 
* not fold within a cell (these proteins may fold after the 

25 .phage is released from the cell) . Intracellularly assembled 
phage are preferred when the IPBD needs large or insoluble 
prosthetic groups (such as Fe 4 S 4 clusters) , since the IPBD 
may not fold if secreted because the prosthetic group is 
lacking in the periplasm. 

30 When variegation is introduced, multiple infections 

could generate hybrid GPs that carry the gene for one PBD 
but have at least some copies of a different PBD on their 
surfaces; it is preferable to minimize this possibility by 
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infecting cells with phage under conditions resulting in a 
low multiple- of- infection (MOI) . 

Bacteriophages are excellent candidates for GPs because 
there is little or no enzymatic activity associated with 
intact mature phage, and because the genes are inactive 
outside a bacterial host, rendering the mature phage 
particles metabolically inert. 

For a given bacteriophage, the preferred OSP is usually 
one that is present on the phage surface in the largest 
number of copies. Nevertheless, an OSP such as M13 gill 
protein (5 copies/phage) may be an excellent choice as OSP 
to cause display of the PBD. 

It is preferred that the wild- type ass. gene be 
preserved. The ipbd gene fragment may be inserted either 
into a second copy of the recipient ojap gene or into a novel 
engineered qss. gene. It is preferred that the osp-ipbd gene 
be placed under control of a regulated promoter. 

The user must choose a site in the candidate OSP gene 
for inserting a iBkfi gene fragment. The coats of most 
bacteriophage are highly ordered. In such bacteriophage, it 
is important to retain in engineered OSP-IPBD fusion 
proteins those residues of the parental OSP that interact 
with other proteins in the virion. For M13 gVIII, we 
preferably retain the entire mature protein, while for M13 
gill, it might suffice to retain the last 100 residues 
(BASS90) (or even fewer). Such a truncated gill protein 
would be expressed in parallel with the complete gill 
protein, as gill protein ia required for phage infectivity. 

The filamentous phage, which include M13, fl, fd, Ifi, 
Ike, Xf, Pfi, and Pf3, are of particular interest. The 
major coat protein is encoded by gene VIII. The 50 amino 
acid mature gene VIII coat protein is synthesized as a 73 
amino acid precoat (ITOK7&) . The first 23 amino acids 
constitute a typical signal -sequence which causes the 
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nascent polypeptide to be inserted into the inner cell 
membrane . 

An coli signal peptidase (SP-I) recognizes amino 
acids 18, 21, and 23, and, to a lesser extent, residue 22, 
5 and cuts between residues 23 and 24 of the precoat (KUHN85a, 
KUHN85b, OLIV87) . After removal of the signal sequence, the 
amino terminus of the mature coat is located on the 
periplasmic side of the inner membrane; the carboxy terminus 
is on the cytoplasmic side. About 3000 copies of the mature 
10 50 amino acid coat protein associate side-by* side in the 
inner membrane. 

The sequence of gene VIII is known, and the amino acid 
sequence can be encoded on a synthetic gene, using lacUVS 
promoter and used in conjunction with the Lacl q repressor. 
15 The lacUVS promoter is induced by IPTG. Mature gene VIII 
protein makes up the sheath around the circular ssDNA. The 
3D structure of fl virion is known at medium resolution; the 
amino terminus of gene VIII protein is on surface of the 
virion and is therefore a preferred atttachment site for the 
20 potential binding domain. A few modifications of gene VXII 
have been made and are discussed below. The 2D structure of 
M13 coat protein is implicit in the 3D structure. Mature 
M13 gene VIII protein has only one domain. 

We have constructed a tripartite gene comprising: 
25 1) DNA encoding a signal sequence directing secretion of 

parts (2) and (3) through the inner membrane, 

2) DNA encoding the mature BPTI sequence, and 

3) DNA encoding the mature M13 gVTII protein. 

This gene causes BPTI to appear in active form on the 
30 surface of M13 phage. 

The amino -acid sequence of M13 pre -coat (SCHA78) , 
called AA_seql, is 
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AA_seql 

* 1 2 \\2 3 3 4 4 5 
5 0 5 0 \/5 0 5 0 5 0 
MKKSLVLKASVAVATLVE 

5 6 6 7 7 
5 0 5 0 3 
MVVVIVGATIGIKLFKKFTS KAS 

10 

The best site for inserting a novel protein domain into M13 
CP is after A23 because SP-I cleaves the precoat protein 
after A23, as indicated by the arrow. Proteins that can be 
15 secreted will appear connected to mature M13 CP at its amino 
terminus. Because the amino terminus of mature M13 CP is 
located on the outer surface of the virion, the introduced 
domain will be displayed on the outside of the virion. The 
uncertainty of the mechanism by which M13CP appears in the 
20 lipid bilayer raises the possibility that direct insertion 
of bEti into gene YHI may not yield a functional fusion 
protein. It may be necessary to change the signal sequence 
of the fusion to, for example, the phoA signal sequence 

(MKQSTIALALLPLLFTPVTKA ) (MARK9 1 ) . Marks ££ SL*. 

25 (MARK86) showed that the pbQ& signal peptide could direct 
mature BPT1 to the JL. co^i periplasm. 

Another vehicle for displaying the IPBD is by 
expressing it as a domain of a chimeric gene containing part 
or all of gene HI. This gene encodes one of the minor coat 
30 proteins of M13. Genes VI, VII, and IX also encode minor 
coat proteins • Each of these minor proteins is present in 
about 5 copies per virion and is related to morphogenesis or 
infection. In contrast, the major coat protein is present 
in more than 2500 copies per virion. The gene VI, VII, and 
35 IX proteins are present at the ends of the virion; these 
three proteins are not post-translationally processed 
(RASC86) . 
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The single- stranded circular phage DNA associates with 
about five copies of the gene III protein and is then 
extruded through the patch of membrane-associated coat 
protein in such a way that the DNA is encased in a helical 
5 sheath of protein (WEBS78) ♦ The DNA does not base pair 
(that would impose severe restrictions on the virus genome) ; 
rather the bases intercalate with each other independent of 
sequence. 

Smith (SMIT85) and de la Cruz fit al. (DEIA88) have 

10 shown that insertions into gene 1X1 cause novel protein 
domains to appear on the virion outer surface. The mini- 
protein's gene may be fused to gene XXI at the site used by 
Smith and by de la Cruz fit al. . at a codon corresponding to 
another domain boundary or to a surface loop of the protein, 

15 or to the amino terminus of the mature protein. 

All published works use a vector containing a single 
modified gene III of fd. Thus, all five copies of gill are 
identically modified. Gene XXX is quite large (1272 b.p. or 
about 20% of the phage genome) and it is uncertain whether 

20 a duplicate of the whole gene can be stably inserted into 
the phage. Furthermore, all five copies of gill protein are 
at one end of the virion. When bivalent target molecules 
(such as antibodies) bind a pentavalent phage, the resulting 
complex may be irreversible. Irreversible binding of the GP 

25 to the target greatly interferes with affinity enrichment of 
the GPs that carry the genetic sequences encoding the novel 
polypeptide having the highest affinity for the target. 

To reduce the likelihood of formation of irreversible 
complexes, we may use a second, synthetic gene that encodes 

30 carboxy- terminal partB of XIX; the carboxy- terminal parts of 
the gene III protein cause it to assemble into the phage. 
For example, the final 29 residues (starting with the 
arginine specified by codon 398) may be enough to cause a 
fusion protein to assemble into the phage. Alternatively, 

35 one might include the final globular domain of mature gill 
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protein, viz. the final 150 to 160 amino acids of gene III 
(BASS90) . we might, for example, engineer a gene that 
consists of (from 5 1 to 

1) a promoter (preferably regulated) , 
5 2) a ribosome-binding site, 

3) an initiation codon, 

4) a functional signal peptide directing secretion of 
parts (5) and (6) throuc£i the inner membrane/ 

5) DMA encoding an IPBD, 

10 6) DNA encoding residues 275 through 424 of M13 gill 

protein, 

7) a translation stop codon, and 

8) (optionally) a transcription stop signal. 

We leave the wild- type gene HI so that some unaltered gene 
15 III protein will be present. Alternatively, we may use gene 
VIII protein as the OSP and regulate the qsd: :ip bd fusion so 
that only one or a few copies of the fusion protein appear 
on the phage. 

M13 gene VI, VTI, and IX proteins sure not processed 
20 after translation. The route by which these proteins are 
assembled into the phage have not been reported. These 
proteins are necessary for normal morphogenesis and 
infectivity of the phage. "Whether these molecules (gene VI 
protein, gene VII protein, and gene IX protein) attach 
25 themselves to the phage: a) from the cytoplasm, b) from the 
periplasm, or c) from within the lipid bilayer, is not 
known. One could use any of these proteins to introduce an 
IPBD onto the phage surface by one of the constructions: 
1) ipbd : : pmcp . 
30 2) pmcp : : ipbd - 

3) signal: : ipbd : : pmcp . and 

4) signal : : pmcp ipbd . 

where Ipfed represents DNA coding on expression for the 
initial potential binding domain; pmcp represents DNA coding 
35 for one of the phage minor coat proteins, VI, VTI, and IX; 
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gjcmal represents a functional secretion signal peptide, 
such as the phoA signal (MKQSTIALALLPLLFTPVTKA) ; and " : : " 
represents in-frame genetic fusion. The indicated fusions 
are placed downstream of a known promoter, preferably a 
5 regulated promoter such as lacUVS . tac . or trp. Fusions (1) 
and (2) are appropriate when the minor coat protein attaches 
to the phage from the cytoplasm or by autonomous insertion 
into the lipid bilayer. Fusion (1) is appropriate if the 
amino terminus of the minor coat protein is free and (2) is 

10 appropriate if the carboxy terminus is free. Fusions (3) 
and (4) are appropriate if the minor coat protein attaches 
to the phage from the periplasm or from within the lipid 
bilayer. Fusion (3) is appropriate if the amino terminus of 
the minor coat protein is free and (4) is appropriate if the 

15 carboxy terminus is free. 

Similar constructions could be made with other 
filamentous phage. Pf3 is a well known filamentous phage 
-that infects PBgyflgmmftB aeruaenosa cells that harbor an 
IncP-l plasmid. The major coat protein of PF3 is unusual in 

20 having no signal peptide to direct its secretion. The 
sequence has charged residues ASP 7 , ARGjj, LYS40, and PHE M -COCr 
which is consistent with the amino terminus being exposed. 
Thus, to cause em IPBD to appear on the surface of Pf3, we 
construct a tripartite gene comprising: 

25 1) a signal sequence known to cause secretion in P. 

aerugenosa (preferably known to cause secretion of 
IPBD) fused in- frame to, 
2) a gene fragment encoding the IPBD sequence, fused in- 
frame to, 

30 3) DNA encoding the mature Pf3 coat protein. 

Optionally, DNA encoding a flexible linker of one to 10 
amino acids and/or amino acids forming a recognition site 
for a specific protease (e.g.. Factor Xa) is introduced 
between the ipbd gene fragment and the Pf3 coat -protein 
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gene. This tripartite gene is introduced into Pf3 so that 
it does not interfere with expression of any Pf3 genes. To 
reduce the possibility of genetic recombination, part (3) is 
designed to have numerous silent mutations relative to the 
5 wild- type gene. Once the signal sequence is cleaved off, 
the IPBD is in the periplasm and the mature coat protein 
acts as an anchor and phage -assembly signal. It does not 
matter that this fusion protein comes to rest in the lipid 
bilayer by a route different from the route followed by the 
10 wild- type coat protein. 

As described in W090/02809, other phage, such as 
bacteriophage «C174, large DNA phage such as X or T4 . and 
even RNA phage, may with suitable adaptations and 
modifications be used as GPs. 
15 IV. C. Bacterial Cel ls as Genetic Packages- 

One may choose any well -characterized bacterial strain 
which (l) may be grown in culture (2) may be engineered to 
display PBDs on its surface, and (3) is can?>atible with 
affinity selection. 
20 Among bacterial cells, the preferred genetic packages 

are SfrlmpyieUa typfrwtriwn, tociUyg subtilis . Pseudomonas 
gterugjnpggi , Vjfegla cfrolerae, Klebsiella pneumonia . Neisseria 
gonorrhoea , Ngj.ggsrjfr ptCTtaqjLtidig, Bacteroideg nodosus . 
fepyaxella, bovis . and especially Escherichia coli . The 
25 potential binding mini -protein may be expressed as an insert 
in a chimeric bacterial outer surface protein (OSP) . All 
bacteria exhibit proteins on their outer surfaces . JL_ coli 
is the preferred bacterial GP and, for it, LamB is a 
preferred OSP. 

30 While most bacterial proteins remain in the cytoplasm, 

others are transported to the periplasmic space (which lies 
between the plasma membrane and the cell wall of gram- 
negative bacteria) , or are conveyed and anchored to the 
outer surface of the cell. Still others are exported 

35 (secreted) into the medium surrounding the cell. Those 
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characteristics of a protein that are recognized by a cell 
and that cause it to be transported out of the cytoplasm and 
displayed on the cell surface will be termed "outer -surface 
transport signals". 
5 Gram-negative bacteria have outer-membrane proteins 

(OMP) , that form a subset of OSPs. Many OMPs span the 
membrane one or more times. The signals that cause OMPs to 
localize in the outer membrane are encoded in the amino acid 
sequence of the mature protein. Outer membrane proteins of 

10 bacteria are initially expressed in a precursor form 
including a so-called signal peptide. The precursor protein 
is transported to the inner membrane, and the signal peptide 
moiety is extruded into the periplasmic space. There, it is 
cleaved off by a "signal peptidase", and the remaining 

15 "mature" protein can now enter the periplasm. Once there, 
other cellular mechanisms recognize structures in the mature 
protein which indicate that its proper place is on the outer 
membrane, and transport it to that location. 

It is well known that the DMA coding for the leader or 

20 signal peptide from one protein may be attached to the DNA 
sequence coding for another protein, protein X, to form a 
chimeric gene whose expression causes protein X to appear 
free in the periplasm. The use of export -permissive 
bacterial strains (LISS85, STAD89) increases the probability 

25 that a signal -sequence -fusion will direct the desired 
protein to the cell surface. 

OSP-IPBD fusion proteins need not fill a structural 
role in the outer membranes of Gram- negative bacteria 
because parts of the outer membranes are not highly ordered. 

30 For large OSPs there is likely to be one or more sites at 
which osp can be truncated and fused to ipbd such that cells 
expressing the fusion will display IPBDs on the cell 
surface. Fusions of fragments of omp genes with fragments 
of an 25 gene have led to X appearing on the outer membrane 

35 (CHAR88b,c, BENS 84, CLEM81) . When such fusions have been 
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made, we can design an os p-ipbd gene by substituting jpfrfl 
for x in the DNA sequence* Otherwise, a successful OMP-IPBD 
fusion is preferably sought by fusing fragments of the best 
omp to an ipbd, expressing the fused gene, and testing the 
5 resultant GPs for display- of- IPBD phenotype. We use the 
available data about the OM? to pick the point or points of 
fusion between omp and j.p bd to maximize the likelihood that 
IPBD will be displayed. (Spacer DNA encoding flexible 
linkers, made, e.g. . of GLY, SER, and ASN, may be placed 

10 between the osp- and i^i-derived fragments to facilitate 
display.) Alternatively, we truncate osp at several sites 
or in a manner that produces osp fragments of variable 
length and fuse the osp fragments to i p bdr cells expressing 
the fusion are screened or selected which display IPBDs on 

15 the cell surface/ Freudl al. (FREU89) have shown that 
fragments of OSPs (such as OmpA) above a certain size are 
incorporated into the outer membrane. An additional 
alternative is to include short segments of random DNA in 
the fusion of omp fragments to ipbd and then screen or 

20 select the resulting variegated population for members 
exhibiting the display- of -IPBD phenotype. 

In L. coli . the LamB protein is a well understood OSP 
and can be used. The 1L. <?oli LamB has been expressed in 
functional form in typhimurium. cholera?/ and paev- 

25 msnia, so that one could display a population of PBDs in any 
of these species as a fusion to &*. coli LamB. pneumonia 
expresses a maltoporin similar to LamB (WEHM89) which could 
also be used. In aeruginosa, the Dl protein (a 
homologue of LamB) can be used (TRIA88) . 

30 LamB is transported to the outer membrane if a 

functional N- terminal sequence is present; further, the 
first 43 amino acids of the mature sequence are required for 
successful transport (BENS 84 ) . As with other OSPs, LamB of 
E. coli is synthesized with a typical signal -sequence which 

35 is subsequently removed. Homology between parts of LamB 
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protein and other outer membrane proteins OmpC, OmpF, and 
PhoE has been detected (NIKA84) , including homology between 
LamB amino acids 39-49 and sequences of the other proteins. 
These subsequences may label the proteins for transport to 
5 the outer membrane. 

The amino acid sequence of LamB is known (CLEM81) , and 
a model has been developed of how it anchors itself to the 
outer membrane (Reviewed by, among others, BENZ88b) . The 
location of its maltose and phage binding domains are also 

10 known (HEIN88) . Using this information, one may identify 
several strategies by which a PBD insert may be incorporated 
into LamB to provide a chimeric OSP which displays the PBD 
on the bacterial outer membrane. 

When the PBDs are to be displayed by a chimeric trans - 

15 membrane protein like LamB, the PBD could be inserted into 
a loop normally found on the surface of the cell (cp. 
BECK83, MAN086) . Alternatively, we may fuse a 5 1 segment of 
the osp gene to the ipbd gene fragment; the point of fusion 
is picked to correspond to a surface -exposed loop of the OSP 

20 and the carboxy terminal portions of the OSP are omitted. 
In LamB, it has been found that up to 60 amino acids may be 
inserted (CHAR88b,c) with display of the foreign epitope 
resulting; the structural features of OmpC, OnqoA, OmpF, and 
PhoE are so similar that one expects similar behavior from 

25 these proteins. 

It should be noted that while LamB may be characterized 
as a binding protein, it is used in the present invention to 
provide an OSTS; its binding domains are not variegated. 

Other bacterial outer surface proteins, such as OmpA, 

30 OmpC, OmpF, PhoE, and pilin, may be used in place of LamB 
and its homologues . OmpA is of particular interest because 
it is very abundant and because homologues are known in a 
wide variety of gram- negative bacterial species. Baker et 
al. (BAKE87) review assembly of proteins into the outer 

35 membrane of coli and cite a topological model of OmpA 
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(VOGE86) that predicts that residues 19--32, 62-73, 105-118, 
and 147-158 are exposed on the cell surface. Insertion of 
a ipbd encoding fragment at about codon 111 or at about 
codon 152 is likely to cause the IPBD to be displayed on the 
5 cell surface* Concerning OmpA, see also MACI88 and MAN088. 
Porin Protein F of Pseudomonas aeruginosa has been cloned 
and has sequence homology to OmpA of JL*. coli (DUCH88) . 
Although this homology is not sufficient to allow prediction 
of surface -exposed residues on Porin Protein F, the methods 

10 used to determine the topological model of OmpA may be 
applied to Porin Protein F. Works related to use of OmpA as 
an OSP include BECK80 and MACI88. 

Misra and Benson (MISR8 8a , MISR8 8b ) disclose a 
topological model of EL. coli OmpC that predicts that, among 

15 others, residues GLY 164 and LEUjso are exposed on the cell 
surface. Thus insertion of an ipbd gene fragment at about 
codon 164 or at about codon 250 of the JL. coli ompC gene or 
at corresponding codons of the £^ typhimurium ompC gene is 
likely to cause IPBD to appear on the cell surface. The 

20 om pC genes of other bacterial species may be used. Other 
works related to OmpC include CATR87 and CLIC88. 

OmpF of JL. coli is a very abundant OSP, *10 4 copies/ 
cell. Pages al. (PAGE90) have published a model of OmpF 
indicating seven surface -exposed segments. Fusion of an 

25 i pbd gene fragment, either as an insert or to replace the 3* 
part of ompF. in one of the indicated regions is likely to 
produce a functional ompF: :ip bd gene the expression of which 
leads to display of IPBD on the cell surface. In 
particular, fusion at about codon 111, 177, 217, or 245 

30 should lead to a functional ompF: ripbd gene. Concerning 
OmpF, see also REID88b, PAGE88, BENS88, TOMM82, and SODE85. 

Pilus proteins are of particular interest because 
piliated cells express many copies of these proteins and 
because several species (N^ gonorrhoeae . £^ aeruginosa . 
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Moraxella bovis . Bacteroides flgflpgyg, and cpU) express 
related pilins. Getzoff and coworkers (GETZ88, PARG87, 
SOME85) have constructed a model of the gonococcal pilus 
that predicts that the protein forms a four-helix bundle 
5 having structural similarities to tobacco mosaic virus 
protein and myohemerythrin. On this model, both the amino 
and carboxy termini of the protein are exposed. The amino 
terminus is methylated. Elleman (ELLE88) has reviewed 
pilins of Bacteroides nodosus and other species and serotype 

10 differences can be related to differences in the pilin 
protein and that most variation occurs in the C- terminal 
region. The amino- terminal portions of the pilin protein 
are highly conserved. Jennings fit al. (JENN89) have grafted 
a fragment of foot-and-mouth disease virus (residues 144- 

15 159) into the nodosus type 4 fimbrial protein which is 
highly homologous to gonococcal pilin. They found that 
expression of the 3 1 - terminal fusion in aeruginosa led to 
a viable strain that makes detectable amounts of the fusion 
protein. Jennings fit al. did not vary the foreign epitope 

20 nor did they suggest any variation. They inserted a GLY-GLY 
linker between the last pilin residue and the first residue 
of the foreign epitope to provide a "flexible linker* . Thus 
a preferred place to attach an IPBD is the carboxy terminus • 
The exposed loops of the bundle could also be used, although 

25 the particular internal fusions tested by Jennings fit al. 
(JENN89) appeared to be lethal in £^ aeruginosa . Concerning 
pilin, see also MCKE85 and ORND85. 

Judd (JTJDD86, JUDD85) has investigated Protein IA of 2L. 
gonorrhoeae and found that the amino terminus is exposed; 

30 thus, one could attach an IPBD at or near the amino terminus 
of the mature P.IA as a means to display the IPBD on the N. 
gonorrhoeae surface. 

A model of the topology of PhoE of . JL. coli has been 
disclosed by van der Ley fit al- (VAND86) . This model 

35 predicts eight loops that are exposed; insertion of an IPBD 



WO 92/15677 

PCT/US92/01456 

SO 

into one of these loops is likely to lead to display of the 
1PBD on the surface of the cell. Residues 158, 201, 238, 
and 275 are preferred locations for insertion of and IPBD. 
Other OSPs that could be used include coli BtuB, 
5 FepA, PhuA, IutA, FecA, and FhuE (GDDM89) which are 
receptors for nutrients usually found in low abundance. The 
genes of all these proteins have been sequenced, but 
topological models are not yet available. Gudraunsdottir et 
(GUDM89) have begun the construction of such a model for 
10 BtuB and FepA by showing that certain residues of BtuB face 
the periplasm and by determining the functionality of 
various BtuB:: FepA fusions. Carmel et al . have 
reported work of a similar nature for FhuA. All Neisseria 
species express outer surface proteins for iron transport 
15 that have been identified and, in many cases, cloned. See 
also MORS 8 7 and M0RS88. 

Many gram-negative bacteria express one or more 
phospholipases . coli phospholipase A, product of the 

pl.flA gene, has been cloned and sequenced by de Geus al. 
20 (DEGE84) * They f ound that the protein appears at the cell 
surface without any posttranslational processing. A ipbd 
gene fragment can be attached at either terminus or inserted 
at positions predicted to encode loops in the protein. That 
phospholipase A arrives on the outer surface without removal 
25 of a signal sequence does not prove that a PldA: :IPBD fusion 
protein will also follow this route. Thus we might cause a 
PldA: : IPBD or IPBD:: PldA fusion to be secreted into the 
periplasm by addition of an appropriate signal sequence. 
Thus, in addition to single binary fusion of an ipbd 
30 fragment to one terminus of pldA , the constructions: 

1) ss: : ipbd : : pldA 

2) ss: :oldA: : ipbd 

should be tested. Once the PldA:: IPBD protein is free in 
the periplasm it does not remember how it got there and the 
35 structural features of PldA that cause it to localize on the 
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outer surface will direct the fusion to the same 
destination. 

IV. D, Bacterial Spores as Genetic Packages: 

Bacterial spores have desirable properties as GP candi- 
5 dates. Spores are much more resistant than vegetative 
bacterial cells or phage to chemical and physical agents, 
and hence permit the use of a great variety of affinity 
selection conditions. Also, Bacillus spores neither 
actively metabolize nor alter the proteins on their surface. 

10 Bacillus spores, and more especially fl*. eufrtiUlg spores, are 
therefore the preferred sporoidal GPs. As discussed more 
fully in WO90/02809, a foreign binding domain may be 
introduced into am outer surface protein such as that 
encoded by the JL_ subtil is cotC or cotD genes. 

15 It is generally preferable to use as the genetic 

package a cell, spore or virus for which an outer surface 
protein which can be engineered to display a IPBD has 
already been identified. However, as explained in 
WO90/02809, the present invention is not limited to such 

20 genetic packages, as an outer surface transport signal may 
be generated by variegation- and- select ion techniques. 

V. E Genetic Construction and Expression Considerations 

The (i)pbd-osp gene may be: a) completely synthetic, b) 
a composite of natural and synthetic DNA, or c) a composite 

25 of natural DNA fragments. The important point is that the 
pbd segment be easily variegated so as to encode a 
multitudinous and diverse family of PBDs as previously 
described. A synthetic ipbd segment is preferred because it 
allows greatest control over placement of restriction sites. 

30 Primers complementary to regions abutting the osp-ipbd gene 
on its 3' flank and to parts of the osp-ipbd gene that are 
not to be varied are needed for sequencing. 

The sequences of regulatory parts of the gene are taken 
from the sequences of natural regulatory elements: a) 

35 promoters, b) Shine -Dalgarno sequences, and c) trans- 
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criptional terminators. Regulatory elements could also be 
designed from knowledge of consensus sequences of natural 
regulatory regions. The sequences of these regulatory 
elements are connected to the coding regions; restriction 
5 sites are also inserted in or adjacent to the regulatory 
regions to allow convenient manipulation. 

The essential function of the affinity separation is to 
separate GPs that bear PBDs (derived from IPBD) having high 
affinity for the target from GPs bearing PBDs having low 

10 affinity for the target. If the elution volume of a GP 
depends on the number of PBDs on the GP surface, then a GP 
bearing many PBDs with low affinity, GP(PBD W ), might co- 
elute with a GP bearing fewer i?BDs with high affinity, 
GP (PBD, ) . Regulation of the os p-pbd gene preferably is such 

15 that most packages display sufficient PBD to effect a good 
separation according to affinity. Use of a regulatable 
promoter to control the level of expression of the Qgp-pfrfl 
allows fine adjustment of the chromatographic behavior of 
the variegated population. 

20 Induction of synthesis of engineered genes in 

vegetative bacterial cells has been exercised through the 
use of regulated promoters such as jacUVS . £eeE, or £as 
(MANI82) . The factors that regulate the quantity of protein 
synthesized are sufficiently well understood that a wide 

25 variety of heterologous proteins can now be produced in IL_ 
coli , B. pubtilis and other host cells in at least moderate 
quantities (BETT88) . Preferably, the promoter for the Qgp.- 
jpbd gene is subject to regulation by a small chemical 
inducer. For example, the lac promoter and the hybrid fcrp- 

30 lac f tac > promoter are regulatable with isopropyl 
thiogalactoside (IPTG) . The promoter for the constructed 
gene need not come from a natural osp gene; any regulatable 
bacterial promoter can be used. A non- leaky promoter is 
preferred. 
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The present invention is not limited to a single method 
of gene design. The oap-ipbd gene need not be synthesized 
in toto : parts of the gene may be obtained from nature. 
One may use any genetic engineering method to produce the 
5 correct gene fusion, so long as one can easily and 
accurately direct mutations to specific sites in the pbfi DNA 
subsequence. 

The coding portions of genes to be synthesized cure 
designed at the protein level and then encoded in DNA. The 

10 ambiguity in the genetic code is exploited to allow optimal 
placement of restriction sites, to create various 
distributions of amino acids at variegated codons, to 
minimize the potential for recombination, and to reduce use 
of codons are poorly translated in the host cell. 

15 V.F Structural Considerations 

The design of the amino -acid sequence for the ipbd- osp 
gene to encode involves a number of structural 
considerations. The design is somewhat different for each 
type of GP. In bacteria, OSPs are not essential, so there 

20 is no requirement that the OSP domain of a fusion have any 
of its parental functions beyond lodging in the outer 
membrane . 

It is desirable that the OSP not constrain the 
orientation of the PBD domain; this is not to be confused 

25 with lack of constraint within the PBD. Cwirla ££ al . 
(CWIR90), Scott and Smith (SCOT90) , and Devlin fit al. 
(DBVL90) , have taught that variable residues in phage - 
displayed random peptides should be free of influence from 
the phage OSP. We teach that binding domains having a 

30 moderate to high degree of conformational constraint will 
exhibit higher specificity and that higher affinity is also 
possible. Thus, we prescribe picking codons for variegation 
that specify amino acids that will appear in a well-defined 
framework. The nature of the side groups is varied through 

35 a very wide range due to the combinatorial replacement of 
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multiple amino acids. The main chain conformations of most 
PBDs of a given class is very similar. The movement of the 
PBD relative to the OSP should not, however, be restricted. 
Thus it is often appropriate to include a flexible linker 
5 between the PBD and the OSP. Such flexible linkers can be 
taken from naturally occurring proteins known to have 
flexible regions. For example, the gill protein of M13 
contains glycine-rich regions thought to allow the amino- 
terminal domains a high degree of freedom. Such flexible 
10 linkers may also be designed. Segments of polypeptides that 
are rich in the amino acids GLY, ASN, SER, and ASP are 
likely to give rise to flexibility. Multiple glycines are 
particularly preferred. 

When we choose to insert the PBD into a surface loop of 
15 an OSP such as LamB, OrnpA, or M13 gill protein, there are a 
few considerations that do not arise when PBD is joined to 
the end of an OSP. In these cases, the OSP exerts some 
constraining influence on the PBD; the ends of the PBD are 
held in more or less fixed positions. We could insert a 
highly varied DNA sequence into the osp gene at codons that 
encode a surface- exposed loop and select for cells that have 
a specific -binding phenotype. When the identified amino- 
acid sequence is synthesized (by any means) , the constraint 
of the OSP is lost and the peptide is likely to have a much 
lower affinity for the target and a much lower specificity. 
Tan and Kaiser (TANN77) found that a synthetic model of BPTI 
containing all the amino acids of BPTI that contact trypsin 
has a for trypsin ~10 T higher than BPTI. Thus, it is 
strongly preferred that the varied amino acids be part of a 
PBD in which the structural constrains are supplied by the 
PBD. 

It is known that the amino acids adjoining foreign 
epitopes inserted into LamB influence the immunological 
properties of these epitopes (VAND90) . We expect that PBDs 
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inserted into loops of LamB, OnpA, or similar OSPs will be 
influenced by the amino acids of the loop and by the OSP in 
general. To obtain appropriate display of the PBD, it may 
be necessary to add one or more linker amino acids between 
5 the OSP and the PBD. Such linkers may be taken from natural 
proteins or designed on the basis of our knowledge of the 
structural behavior of amino acids. Sequences rich in GLY, 
SER, ASN, ASP, ARG, and THR are appropriate. One to five 
amino acids at either junction are likely to impart the 

10 desired degree of flexibility between the OSP and the PBD. 

A preferred site for insertion of the ipbfl gene into 
the phage 2SE gene is one in which: a) the IPBD folds into 
its original shape, b) the OSP domains fold into their 
original shapes, and c) there is no interference between the 

15 two domains. 

If there is a model of the phage that indicates that 
either the amino or carboxy terminus of an OSP is exposed to 
solvent, then the exposed terminus of that mature OSP 
becomes the prime candidate for insertion of the ipbd gene. 

20 A low resolution 3D model suffices. 

In the absence of a 3D structure, the amino and carboxy 
termini of the mature OSP are the best candidates for 
insertion of the ipbd gene. A functional fusion may require 
additional residues between the IPBD and OSP domains to 

25 -avoid unwanted interactions between the domains. Random- 
sequence DNA or DNA coding for a specific sequence of a 
protein homologous to the IPBD or OSP, can be inserted 
between the pqp fragment and the ipbd fragment if needed. 
Fusion at a domain boundary within the OSP is also a 

30 good approach for obtaining a functional fusion. Smith 
exploited such a boundary when subcloning heterologous DNA 
into gene III of f 1 (SMIT85) . 

The criteria for identifying OSP domains suitable for 
causing display of an IPBD are somewhat different from those 

35 used to identify and IPBD. When identifying an OSP, minimal 
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size is not so important because the OSP domain will not 
appear in the final binding molecule nor will we need to 
synthesize the gene repeatedly in each variegation round. 
The major design concerns are that: a) the OSP::IPBD fusion 
5 causes display of IPBD, b) the initial genetic construction 
be reasonably convenient, and c> the osp::inbd gene be 
genetically stable and easily manipulated. There are 
several methods of identifying domains. Methods that rely 
on atomic coordinates have been reviewed by Janin and 
10 Chothia (JMJI85) . These methods use matrices of distances 
between a carbons (C B ) , dividing planes ROSB85) , or 

buried surface (RASH84) . Chothia and collaborators have 
correlated the behavior of many natural proteins with domain 
structure (according to their definition) . Rashin correctly 
predicted the stability of a domain comprising residues 206- 
316 of thermolysin (VITA84, RASH84) . 

Many researchers have used partial proteolysis and 
protein sequence analysis to isolate and identify stable 
domains. (See, for example, VITA84, POTE83, SC0T87a, and 
20 PAB079.) Pabo gt used calorimetry as an indicator that 
the cl repressor from the coliphage X contains two domains; 
they then used partial proteolysis to determine the location 
of the domain boundary. 

If the only structural information available is the 
amino acid seguence of the candidate OSP, we can use the 
sequence to predict turns and loops. There is a high 
probability that some of the loops and turns will be 
correctly predicted (cf^. Chou and Fasman, (CH0U74) ) ; these 
locations are also candidates for insertion of the ipM gene 
30 fragment. 

In bacterial OSPs, the major considerations are: a) 
that the PBD is displayed, and b) that the chimeric protein 
not be toxic. 
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From topological models of OSPs, we can determine 
whether the amino or carboxy termini of the OSP is exposed. 
If so, then these are excellent choices for fusion of the 
osp fragment to the ipbd fragment. 
5 The lamB gene has been sequenced and is available on a 

variety of plasmids (CLEM81, CHAR88a,b) . Numerous fusions 
of fragments of lamB with a variety of other genes have been 
used to study export of proteins in JL. ££li* From various 
studies, Charbit ££ al» (CHAR88a,b) have proposed a model 

10 that specifies which residues of LamB are: a) embedded in 
the membrane, b) facing the periplasm, and c) facing the 
cell surface; we adopt the numbering of this model for amino 
acids in the mature protein. According to this model, 
several loops on the outer surface are defined, including: 

15 1) residues 88 through 111, 2) residues 145 through 165, and 
3) 236 through 251. 

Consider a mini-protein embedded in LamB. For example, 
insertion of DNA encoding 6 1 NXCX 5 XXXCX 10 SG 12 between codons 153 
and 154 of lamB is likely to lead to a wide variety of LamB 

20 derivatives being expressed on the surface of coli cells. 
Gj, N 2 , S u , and G 12 axe supplied to allow the mini-protein 
sufficient orientational freedom that is can interact 
optimally with the target. Using affinity enrichment 
(involving, for example, FACS via a f luorescently labeled 

25 target, perhaps through several rounds of enrichment) , we 
might obtain a strain (named, for example, BEST) that 
expresses a particular LamB derivative that shows high 
affinity for the predetermined taxget. An octapeptide 
having the sequence of the inserted residues 3 through 10 

30 from BEST is likely to have an affinity and specificity 
similar to that observed in BEST because the octapeptide has 
am internal structure that keeps the amino acids in a 
conformation that is quite similar in the LamB derivative 
and in the isolated mini -protein. 
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Fusing one or more new domains to a protein may make 
the ability of the new protein to be exported from the cell 
different from the ability of the parental protein. The 
signal peptide of the wild- type coat protein may function 
5 for authentic polypeptide but be unable to direct export of 
a fusion. To utilize the Sec-dependent pathway, one may 
need a different signal peptide. Thus, to express and 
display a chimeric BPTI/M13 gene VIII protein, we found it 
necessary to utilize a heterologous signal peptide (that of 
10 phoA) . 

GPs that display peptides having high affinity for the 
target may be quite difficult to elute from the target, 
particularly a multivalent target. (Bacteria that are bound 
very tightly can simply multiply in situ .) Por phage, one 

15 can introduce a cleavage site for a specific protease, such 
as blood- clotting Factor Xa, into the fusion OSP protein so 
that the binding domain can be cleaved from the genetic 
package. Such cleavage has the advantage that all resulting 
phage have identical OSPs and therefore are equally 

10 infective, even if polypeptide- displaying phage can be 
eluted from the affinity matrix without cleavage. This step 
allows recovery of valuable genes which might otherwise be 
lost. To our knowledge, no one has disclosed or suggested 
using a specific protease as a means to recover an 

S5 information- containing genetic package or of converting a 
population of phage that vary in infectivity into phage 
having identical infectivity. 
IV.G. Synthesis of Gene Tna^p 

The present invention is not limited to any particular 

0 method or strategy of DNA synthesis or construction. 
Conventional DNA synthesizers may be used, with appropriate 
reagent modifications for production of variegated DNA 
(similar to that now used for production of mixed probes) . 
The osD-pbd gen^fl may be created by inserting vgDNA 

5 into an existing parental gene, such as the osp-ipbd shown 
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to be displayable by a suitably transformed GP. The present 
invention is not limited to any particular method of 
introducing the vgDNA, e.g., cassette mutagenesis or single- 
stranded-oligonucleotide-directed mutagenesis 
5 iv. H. Operative Cloning Vector 

The operative cloning vector (OCV) is a replicable 
nucleic acid used to introduce the chimeric iofcsi-flSB or 
j.pE>d-os p gene into the genetic package. When the genetic 
package is a virus, it may serve as its own OCV. For cells 

10 and spores, the OCV may be a plasmid, a virus, a phagemid, 
or a chromosome. 
TV. I. Transformation of cells; 

When the GP is a cell, the population of GPs is created 
by transforming the cellB with suitable OCVs. When the GP 

15 is a phage, the phage are genetically engineered and then 
transfected into host cells suitable for aznplif ication. 
When the GP is a spore, cells capable of sporulation are 
transformed with the OCV while in a normal metabolic state, 
and then sporulation is induced so as to cause the OSP-PBDs 

20 to be displayed. The present invention is not limited to 
any one method of transforming cells with DNA. 

The transformed cells are grown first under non- 
selective conditions that allow expression of plasmid genes 
^arid then selected to kill untrans formed cells. Transformed 

25 cells are then induced to express the osp-pbd gene at the 
appropriate level of induction. The GPs carrying the IPBD 
or PBDs are then harvested by methods appropriate to the GP 
at hand, generally, centrifugation to pelletize GPs and 
re suspension of the pellets in sterile medium (cells) or 

30 buffer (spores or phage) . They are then ready for 
verification that the display strategy was successful (where 
the GPs all display a "test" IPBD) or for affinity selection 
(where the GPs display a variety of different PBDs) . 
IV. J, Verification of Display Strategy: 
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The harvested packages are tested to determine whether 
the IPBD is present on the surface. In any tests of GPs for 
the presence of IPBD on the GP surface, any ions or 
cofactors known to be essential for the stability of IPBD or 
AfM(IPBD) are included at appropriate levels. The tests can 
be done, e.g., by a) by affinity labeling, b) enzymatically, 
c) spectrophotometrically, d) by affinity separation, or e) 
by affinity precipitation. The AfM(lPBD) in this step is 
one picked to have strong affinity (preferably, 
Kj < 10*" M) for the IPBD molecule and little or no affinity 
for the wtGP. 

V. AFFINITY SELECTION OF rtMSVP.^TwnTna HCTSSTS 
V,Ar Affinity Separation Technoln o v. Gangly 

Affinity separation is used initially in the present 
invention to verify that the display system is working, 
Ll&x., that a chimeric outer surface protein has been 
expressed and transported to the surface of the genetic 
package and is oriented so that the inserted binding domain 
iB accessible to target material. When used for this 
purpose, the binding domain is a known binding domain for a 
particular target and that target is the affinity molecule 
used in the affinity separation process. For example, a 
display system may be validated by using inserting DMA 
encoding BPTI into a gene encoding an outer surface protein 
25 of the genetic package of interest, and testing for binding 
to anhydrotrypsin, which is normally bound by BPTI. 

If the genetic packages bind to the target, then we 
have confirmation that the corresponding binding domain is 
indeed displayed by the crenetic nark-serA t>»~u=~«„ ...v,j „u 
30 display the binding domain (and thereby bind the target) 
are separated from those which do not. 

Once the display system is validated, it is possible to 
use a variegated population of genetic packages which 
display a variety of different potential binding domains, 
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and use affinity separation technology to determine how well 
they bind to one or more targets* This target need not be 
one bound by a known binding domain which is parental to the 
displayed binding domains, ,Lufij_/ one may select for binding 
5 to a new target. 

The term "affinity separation means" includes, but is 
not limited to: a) affinity column chromatography, b) batch 
elution from an affinity matrix material, c) batch elution 
from an affinity material attached to a plate, d) fluores- 

10 cence activated cell sorting, and e) electrophoresis in the 
presence of target material. "Affinity material" is used to 
mean a material with affinity for the material to be 
purified, called the "analyte". In most cases, the 
association of the affinity material and the analyte is 

15 reversible so that the analyte can be freed from the 
affinity material once the impurities are washed away. 
Vt9t ftfiEinity Cartography r generally 

Affinity column chromatography, batch elution from an 
affinity matrix material held in some container, and batch 

20 elution from a plate are very similar and hereinafter will 
be treated under "affinity chromatography." 

If affinity chromatography is to be used, then: 

1) the molecules of the target material must be of 
sufficient size and chemical reactivity to be applied 

25 to a solid support suitable for affinity separation, 

2) after application to a matrix, the target material 
preferably does not react with water, 

3) after application to a matrix, the target material 
preferably does not bind or degrade proteins in a non- 
30 specific way, and 

4) the molecules of the target material must be suffi- 
ciently large that attaching the material to a matrix 
allows enough unaltered surface area (generally at 
least 500 A a , excluding the atom that is connected to 

35 the linker) for protein binding. 
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Affinity chromatography is the preferred separation 
means, but FACS, electrophoresis, or other means may also be 
used. 

The present invention makes use of affinity separation 
5 of bacterial cells, or bacterial viruses (or other genetic 
packages) to enrich a population for those cells or viruses 
carrying genes that code for proteins with desirable binding 
properties. 
V.C. Target Materials 
10 The present invention may be used to select for binding 

domains which bind to one or more target materials, and/or 
fail to bind to one or more target materials. Specificity, 
of course, is the ability of a binding molecule to bind 
strongly to a limited set of target materials, while binding 
IS more weakly or not at all to another set of target materials 
from which the first set must be distinguished. 

The target materials may be organic macromolecules, 
such as polypeptides, lipids, polynucleic acids, and 
polysaccharides, but are not so limited* The present 
20 invention is not, however, limited to any of the above- 
identified target materials. The only limitation is that 
the target material be suitable for affinity separation. 
Thus, almost any molecule that is stable in aqueous solvent 
may be used as a target. 
25 Serine proteases such as human neutrophil elastase 

(HNE> are an especially interesting class of potential 
target materials . Serine proteases are ubiquitous in living 
organisms and play vital roles in processes such as: 
digestion, blood clotting, fibrinolysis, immune response, 
30 fertilization, and post-translational processing of peptide 
hormones • Although the role these enzymes play is vital, 
uncontrolled or inappropriate proteolytic activity can be 
very damaging. 

V.D. Im mobilization or Labeling of Target Material 
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For chromatography, FACS, or electrophoresis there may 
be a need to covalently link the target material to a second 
chemical entity. For chromatography the second entity is a 
matrix, for FACS the second entity is a fluorescent dye, and 
5 for electrophoresis the second entity is a strongly charged 
molecule. In many cases, no coupling is required because 
the target material already has the desired property of: a) 
immobility, b) fluorescence, or c) charge. In other cases, 
chemical or physical coupling is required. 

10 It is not necessary that the actual target material be 

used in preparing the immobilized or labeled analogue that 
is to be used in affinity separation; rather, suitable 
reactive analogues of the target material may be more 
convenient. Target materials that do not have reactive 

15 functional groups may be immobilized by first creating a 
reactive functional group through the use of some powerful 
reagent, such as a halogen. In some cases, the reactive 
groups of the actual target material may occupy a part on 
the target molecule that is to be left undisturbed. In that 

20 case, additional functional groups may be introduced by 
synthetic chemistry. 

Two very general methods of immobilization are widely 
used. The first is to biotinylate the compound of interest 
and then bind the biotinylated derivative to immobilized 

25 avidin. The second method is to generate antibodies to the 
target material, immobilize the antibodies by any of 
numerous methods, and then bind the target material to the 
immobilized antibodies. Use of antibodies is more 
appropriate for larger target materials; small targets 

30 (those comprising, for example, ten or fewer n on -hydrogen 
atoms) may be so completely engulfed by an antibody that 
very little of the target is exposed in the target -antibody 
complex. 

Non-covalent immobilization of hydrophobic molecules 
35 without resort to antibodies may also be used. A compound, 
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such as 2,3,3-trimethyldecane is blended with a matrix 
precursor, such as sodium alginate, and the mixture is 
extruded into a hardening solution. The resulting beads 
will have 2,3,3-trimethyldecane dispersed throughout and 
5 exposed on the surface. 

Other immobilization methods depend on the presence of 
particular chemical functionalities. A polypeptide will 
present - NBj (N- terminal ; Lysines ) , - COOH ( C - terminal ; 
Aspartic Acids; Glutamic Acids) , -OH (Serines; Threonines; 
10 Tyrosines), and -SH (Cysteines). For the reactivity of 
amino acid side chains, see CREI84. A polysaccharide has 
free -OH groups, as does DNA, which has a sugar backbone. 

Matrices suitable for use as support materials include 
polystyrene, glass, agarose and other chromatographic 
15 supports, and may be fabricated into beads, sheets, columns, 
wells, and other forms as desired. 

Early in the selection process, relatively high 
concentrations of target materials may be applied to the 
matrix to facilitate binding; target concentrations may 
20 subsequently be reduced to select for higher affinity SBDs. 
V.E. Elution of Lower Affinity PBD-Bearina Genetic Packages 
The population of GPs is applied to an affinity matrix 
tinder conditions compatible with the intended use of the 
binding protein and the population is fractionated by 
25 passage of a gradient of some solute over the column. The 
process enriches for PBDs having affinity for the target and 
for which the affinity for the target is least affected by 
the eluants used. The enriched fractions are those 
containing viable GPs that elute from the column at greater 
30 concentration of the eluant. 

The eluants preferably are capable of weakening 
noncovalent interactions between the displayed PBDs and the 
immobilized target material. Preferably, the eluants do not 
kill the genetic package; the genetic message corresponding 
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to successful mini -proteins is most conveniently amplified 
by reproducing the genetic package rather than by in ritEfi 
procedures such as PGR. The list of potential eluants 
includes salts (including Na+, NH4+, Rb+, S0 4 --, HjPCV, 

5 citrate, K+, Li+, Cs+, HS0 4 -, CQ3--, Ca++, Sr++, C1-, P0 4 , 

HCO3-, Mg++, Ba++, Br-, HP0 4 -- and acetate) , acid # heat, com- 
pounds known to bind the target, and soluble target material 
(or analogues thereof) . 

The uneluted genetic packages contain DNA encoding 

10 binding domains which have a sufficiently high affinity for 
the target material to resist the elution conditions. The 
DNA encoding such successful binding domains may be 
recovered in a variety of ways. Preferably, the bound 
genetic packages are simply eluted by means of a change in 

15 the elution conditions. Alternatively, one may culture the 
genetic package in situ , or extract the target- containing 
matrix with phenol (or other suitable solvent) and amplify 
the DNA by PCR or by recombinant DNA techniques. 
Additionally, if a site for a specific protease has been 

20 engineered into the display vector, the specific protease is 
used to cleave the binding domain from the GP. 

Nonspecific binding to the matrix, etc., may be 
identified or reduced by techniques well known in the 
affinity separation art. 

25 ,VtFt Recovery postages; 

Recovery of packages that display binding to an 
affinity column may be achieved in Beveral ways, including: 

1) collect fractions eluted from the column with a 
gradient as described above; fractions eluting later 

30 in the gradient contain GPs more enriched for genes 

encoding PBDs with high affinity for the column, 

2) elute the column with the target material in soluble 
form. 
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3) flood the matrix with a nutritive medium and grow the 
desired packages is situ . 

4) remove parts of the matrix and use them to inoculate 
growth medium, 

5 5) chemically or enzymatically degrade the linkage 

holding the target to the matrix so that GPs still 
bound to target are eluted, or 
6) degrade the packages and recover DNA with phenol or 
other suitable solvent; the recovered DNA is used to 
10 transform cells that regenerate GPs. 

It is possible to utilize combinations of these methods. It 
should be remembered that what we want to recover from the 
affinity matrix is not the GPs per se . but the information 
in them. Recovery of viable GPs is very strongly preferred, 
15 but recovery of genetic material is essential. If cells, 
spores, or virions bind irreversibly to the matrix but are 
not killed, we can recover the information through situ 
cell division, germination or infection respectively. 
Proteolytic degradation of the packages and recovery of DNA 
20 is not preferred. 

V.G. Amplifying the Enric hed Packages 
Viable GPs having the selected binding trait are 
amplified by culture in a suitable medium, or, in the case 
of phage, infection into a host so cultivated. If the GPs 
25 have been inactivated by the chromatography, the OCV 
carrying the osp - pbd gene are recovered from the GP, and 
introduced into a new, viable host. 
V.H. Characterizing the Putative SBDs: 

For one or more clonal isolates, we may subclone the 
30 sbd gene fragment, without the osp fragment, into an expres- 
sion vector such that each SBD can be produced as a free 
protein. Physical measurements of the strength of binding 
may be made for each free SBD protein by any suitable 
method. 
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If we find that the binding is not yet sufficient, we 
decide which residues of the SBD (now a new PPBD) to vary 
next. If the binding is sufficient, then we now have a 
expression vector bearing a gene encoding the desired novel 
5 binding protein. 

V.I- Joint selections: 

One may modify the affinity separation of the method 
described to select a molecule that binds to material A but 
not to material B, or that binds to both A and B, either 
10 alternatively or simultaneously. 
V.J. Engineering of Antagonists 

It may be desirable to provide an antagonist to an 
enzyme or receptor. This may be achieved by making a 
molecule that prevents the natural substrate or agonist from 
15 reaching the active site. Molecules that bind directly to 
the active site may be either agonists or antagonists ♦ Thus 
we adopt the following strategy. We consider enzymes and 
receptors together under the designation TER (Target Enzyme 
or Receptor) . 

20 For most TERs, there exist chemical inhibitors that 

block the active site. Usually, these chemicals sure useful 
only as research tools due to highly toxicity. We make two 
affinity matrices: one with active TER and one with blocked 
TER. We make a variegated population of GP(PBD)s and select 

25 for SBPs that bind to both forms of the enzyme, thereby 
obtaining SDPs that do not bind to the active site. We 
expect that SBDs will be found that bind different places on 
the enzyme surface. Pairs of the fifea genes are fused with 
an intervening peptide segment. For example, if SBD-l and 

30 SBD- 2 are binding domains that show high affinity for the 
target enzyme and for which the binding is non- competitive, 
then the gene sbd-1: ! linker: :sbd-2 encodes a two -domain 
protein that will show high affinity for the target. We 
make several fusions having a variety of SBDs and various 
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linkers. Such compounds have a reasonable probability of 
being an antagonist to the target enzyme. 

VI. EXPLOITATION OP SUCCESSFUL BINDING DOMAINS AND 
5 CORRESPONDING DNAS 

While the SBD may be produced by recombinant DMA 
techniques, an advantage inhering from the use of a mini- 
protein as an IPBD is that it is likely that the derived SBD 
will also behave like a mini-protein and will be obtainable 
10 by means of chemical synthesis. (The term -chemical 
synthesis", as used herein, includes the use of enzymatic 
agents in a cell -free environment.) 

It is also to be understood that mini-proteins obtained 
by the method of the present invention may be taken as lead 
15 compounds for a series of homologues that contain non- 
naturally occurring amino acids and groups other than amino 
acids. For example, one could synthesize a series of 
homologues in which each member of the series has one amino 
acid replaced by its D enantiomer. One could also make 
homologues containing constituents such as 0 alanine, 
aminobutyric acid, 3-hydroxyproline, 2-Aminoadipic acid, fi- 
ethylasperagine, norvaline, s&sl,; these would be tested for 
binding and other properties of interest, such as stability 
and toxicity. 

Peptides may be chemically synthesized either in 
solution or on supports. Various combinations of stepwise 
synthesis and fragment condensation may be employed. 

During synthesis, the amino acid side chains are 
protected to prevent branching. Several different 
protective groups are useful for the protection of the thiol 
groups of cysteines: 

l) 4-methoxybenzyl (MBzl; Mob) (NISH82; ZAFA88) , removable 
with HF; 



20 
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2) acetamidomethyl (Acm) (NISH82; NISH86; BECK89c) , 
removable with iodine; mercury ions (e.g. . mercuric 
acetate) ; silver nitrate; and 

3) S-para-methoxybenzyl (H0UG84) . 

5 Other thiol protective groups may be found in standard 

reference works such as Greene, PROTECTIVE GROUPS IN ORGANIC 
SYNTHESIS (1981) . 

Once the polypeptide chain has been synthesized, 
disulfide bonds must be formed. Possible oxidizing agents 

10 include air (HOUG84; NISH86) , ferricyanide (NISH82; HOUG84) , 
iodine (NISH82) f and performic acid (HOUG84) . Temperature, 
pH, solvent, and chaotropic chemicals may affect the course 
of the oxidation. 

A large number of micro -proteins with a plurality of 

15 disulfide bonds have been chemically synthesized in 
biologically active form: conotoxin Gl (13AA, 4 Cys) (NISH- 
82); heat-stable enterotoxin ST (18AA, 6 Cys) (HOUG84) ; 
analogues of ST (BHAT86) ; Q-conotoxin GVIA (27AA, 6Cys) (N- 
ISH86; RIVI87b); Q- conotoxin MVTIA (27 AA, 6 Cys) (OLIV87b) ; 

20 a-conotoxin SI (13 AA, 4 Cys) (ZAFA88) ; ^-conotoxin Ilia 
(22AA, 6 Cys) (BECK89C, CRUZ89, HATA90) . Sometimes, the 
polypeptide naturally folds so that the correct disulfide 
bonds are formed. Other times, it must be helped along by 
use of a differently removable protective group for each 

25 pair of cysteines. 

The successful binding domains of the present invention 
may, alone or as part of a larger protein, be used for any 
purpose for which binding proteins are suited, including 
isolation or detection of target materials. In furtherance 

30 of this purpose, the novel binding proteins may be coupled 
directly or indirectly, covalently or noncovalently, to a 
label, carrier or support. 

When used as a pharmaceutical, the novel binding 
proteins may be contained with suitable carriers or 

35 adjuvants. 
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EXAMPLE X 

DESIGN" AND MUTAGENESIS OF A CLASS 1 MICRO - PROTEIN 

To obtain a library of binding domains that are 
conformationally constrained by a single disulfide, we 
5 insert DNA coding for the following family of micro-proteins 
into the gene coding for a suitable 0SP. 

X j - C - - X3 - 3^ - X5 - C - Xg 
10 I J 

Where 1 1 indicates disulfide bonding . Disulfides 

normally do not form between cysteines that are consecutive 
15 on the polypeptide chain. One or more of the residues 
indicated above as X^ will be varied extensively to obtain 
novel binding. There may be one or more amino acids that 
precede X x or follow x«, however, the residues before X, or 
after X$ will not be significantly constrained by the 
20 diagrammed disulfide bridge, and it is less advantageous to 
vary these remote, unbridged residues. The last X residue 
is connected to the OSP of the genetic package. 

x i* X3/ X*/ X5, and Xg can be varied independently; 
a different scheme of variegation could be used at each 
25 position. X x and X$ are the least constrained residues and 
may be varied less than other positions. 

X! and Xfi can be, for example, one of the amino acids 
[E, K, T, and A]; this set of amino acids is preferred 
because: a) the possibility of positively charged, negative- 
30 ly charged, and neutral amino acids is provided, b) these 
amino acids can be provided in 1:1:1:1 ratio via the codon 
KMG (R = equimolar A and G, M = equimolar A and C), and c) 
these amino acids allow proper processing by signal 
peptidases . 

35 In a preferred embodiment, Xa, X* and Xj are 

initially variegated by encoding each by the codon NNT, 
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which encodes the substitution set IF, S, Y, C, L, P, H, R, 
I, T, N, V, A, D f and G] . 

The advantages of the NNT over the NNK codon become 
increasingly apparent as the number of variegated, codons 
increased. Tables 10 and 130 compare libraries in which six 
codons have been varied either by NNT or NNK codons. NOT 
encodes 15 different amino acids and only 16 DNA sequences. 
Thus, there are 1.139 - 10 7 amino-acid sequences, no stops ,^ 
and only 1.678 • 10 7 DNA sequences. A library of 10 8 
independent transformants will contain 99% of all possible 
sequences. The NNK library contains 6.4 - 10 7 sequences, 
but complete sampling requires a much larger number of 
independent transformants. 

This sequence can be displayed as a fusion to the gene 
15 III protein of M13 using the native M13 gene III promoter 
and signal sequence. The sequence of M13 gene III protein, 
from residue 16 to 23, is SwHSAETVE,,; signal peptidase-I 
cleaves after S 1S . We replace this segment with 
S lS GA 1 ^EGX 1 CX J X s X 4 X5CX 4 SYIEGRVIETVE . 
20 Note that changing H 17 S tt to GA does not impare the phage for 
infectivity. It is useful to insert a bovine F.Xa 
recognition/cleavage site (YIEGR/VI) between the PBD and the 
mature III protein; this not only allows orientational 
freedom for the PBD, but also allows cleavage of the PBD 

25 from the GPi 

A phage library in which X,, X,, X5, and X« are encoded 
by NNT (allowing F, S, Y, C, L, P, H, R, V, T, N, V, A, D, 
& G) and in which Xs and X, are encoded by NNG (allowing L, 
S, W, P, Q, R. M, T, K, V, A, E, and G) is named TN2. This 
library displays about 8.55 x 10 6 micro -proteins encoded by 
about 1.5 x 10 7 DNA sequences. NNG is used at the third and 
fourth variable positions (the central positions of the 
disulfide- closed loop) at least in part to avoid the 
possibility of cysteines at these positions. 
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Devlin, et al., screened io 7 transformants, each of 
which could display one of 10 12 random pentadecapeptides, for 
affinity with streptavidin, and found 20 streptavidin - 
binding phage isolates, with eight unique sequences (»A»- 
5 . All contained HP; 15/20, HPQ; and 6/20, HPQF, though 

in different positions within the pentadecapeptide. The 
most frequently encountered isolates were D(5), I (4), and 
A (3), which entirely lacked cysteines. However, two 
positive isolates, »E» (l) and »F» (2) , included a pair of 
10 cysteines positioned so that formation of a disulfide bond 
was possible. The sequences of these isolates is given in 
Table 820. 



20 



T " 7 = J .ecw S «i* ls d that our tN2 iinrary should include a 
putative micro-protein, HPQ, similar enough to Devlin's »E» 
15 and »F» peptides to have the potential of exhibiting 
streptavidin-binding activity. HPQ comprises the AEG amino 
terminal sequence common to all members of the TN2 library, 
followed by the sequence PCHPQFCQ which has the potential 
for forming a disulfide bridge with a span of four, followed 
by a serine (S) and a bovine factor Xa recognition site 
(YIEGR/IV} (see Table 820) . Pilot experiments showed that 
the binding of HPQ-bearing phage to streptavidin was 
comparable to that of Devlin's -F" isolate; both were 
marginally above background (i.7x) . We therefore screened 
25 our TK2 library against immobilized streptavidin. 

Streptavidin is available as free protein (Pierce) with 
a specific activity of 14.6 units per mg (1 unit will bind 
l /tg of biotin) . a stock solution of l mg per ml in PBS 
containing 0.01% azide is made. lOO/iL of StrAv stock is 

^ _w «^ «u fiu capacicy well ot Immulon (#4) plates 

and incubated overnight at 4«C. The stock is removed and 
replaced with 250 M L of PBS containing BSA at a 
concentration of 1 mg/mL and left at 4°C for a further 1 
hour. Prior to use in a phage binding assay the wells are 
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washed rapidly 5 times with 250 fiL of PBS containing 0.1% 
Tween. 

To each StrAv- coated well is added 100 fiL of binding 
buffer (PBS with 1 mg per mL BSA) containing a known 
5 quantity of phage (10 !l pfu's of the TN2 library). 
Incubation proceeds for 1 hr at room temperature followed by 
removal of the non-bound phage and 10 rapid washes with PBS 
0.1% Tween, then further washed with citrate buffers of pH 
7, 6 and 5 to remove non-specific binding. The bound phage 

10 are eluted with 250 fiL of pH2 citrate buffer containing 1 mg 
per mL BSA and neutralization with 60 of 1M tris pH 8. 
The eluate was used to infect bacterial cells which 
generated a new phage stock to be used for a further round 
of binding, washing and elution. The enhancement cycles 

15 were repeated two more times (three in total) after which 
time a number of individual phage were sequenced and tested 
as clonal isolates. The number of phage present in each 
step is determined as plaque forming units (pfu's) following 
appropriate dilutions and plating in a lawn of F' containing 

20 E. coll. 

Table 838 shows the peptide sequences found to bind to 
StrAv and their frequency in the random picks taken from the 
final (round 3) phage pool. 

The intercysteine segment of all of the putative micro- 
25 proteins examined contained the HPQF motif. The variable 
residue before the first cysteine could have contained any 
of {F,S,Y,C,L,P,H,R,I,T,N,V,A,D,G}; the residues selected 
were {Y,H,L,D,N} while phage HPQ has P. The variable 
residue after the second cysteine also could have had 
30 {F^J^^^P^rRJ^r^V^D^}; the residues selected were 
{p,S,G,R,V} while phage HPQ has Q. The relatively poor 
binding of phage HPQ could be due to P 4 or to Q^ or both. 
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In a control experiment, the TN2 library was screened 
in an identical m a nn er to that shown above but with the 
target protein being the blocking agent BSA. Following 
three rounds of binding, elution, and anqplif ication, sixteen 
5 random phage plaques were picked and sequenced. Half of the 
clones demonstrated a lack of insert (8/16) , the other half 
had the sequences shown in Table 839. There is no consensus 
for this collection. 

We have displayed a related micro -protein, HPQS, on 
10 phage. It is identical to HPQ except for the replacement of 
CHPQFC with CHPQFPRC (see Table 820) . When displayed, HPQ6 
had a substantially stronger affinity for streptavidin than 
<=n t~bpi"r HPQ ot Devlin's n F D isolate. (Devlin's "E" isolate 
was not studied.) Treatment with dithiothreitol (DTT) 
15 markedly reduced the binding of HPQ6 phage (but not control 
phage) to streptavidin, suggesting that the presence of a 
disulfide bridge within the displayed peptide was required 
for good binding. In view of the results of the screening 
of the TN2 library, it is likely that the binding of phage 
HPQ6 could be further improved by changing P 4 to. one of 
{Y,H,L, D,N} and/or changing Q 13 to one of {P,S,G,R,V}. 



20 



EXAMPLE II 
A CYS: : HELIX: :TURN: z STRAND: :CYS UNIT 
25 The parental Class 2 micro-protein may be a naturally- 

occurring Class 2 micro-protein. It may also be a domain of 
a larger protein whose structure satisfies or may be 
modified so as to satisfy the criteria of a class 2 micro - 
protein. The modification may be a simple one, such as the 

— — — ~w.~w*^ « ww^.mw \ul cl j^clj.^. (jysue.me»/ jjulo cue 

base of a hairpin structure so that the hairpin may be 
closed off with a disulfide bond, or a more elaborate one, 
so as the modification of intermediate residues so as to 
achieve the hairpin structure. The parental class 2 micro- 
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protein may also be a composite of structures from two or 
more naturally- occurring proteins, e.g. . an a helix of one 
protein and a 0 strand of a second protein. 

One micro-protein motif of potential use comprises a 
5 disulfide loop enclosing a helix, a turn, and a return 
strand. Such a structure could be designed or it could be 
obtained from a protein of known 3D structure. Scorpion 
neurotoxin, variant 3, (ALMA83a, ALMA83b) (hereafter 
ScorpTx) contains a structure diagrammed in Figure 1 that 

10 comprises a helix (residues N22 through N33) , a turn 
(residues 33 through 35) , and a return strand (residues 36 
through 41) . ScorpTx contains disulfides that join residues 
12-65, 16-41, 25-46, and 29-48. CYS^ and CYS 4I are quite 
close and could be joined by a disulfide without deranging 

15 the main chain. Figure 1 shows CYS^ joined to CYS 41 . In 
addition, CYS^ has been changed to GLN. It is expected that 
a disulfide will form between 25 and 41 and that the helix 
shown will form; we know that the amino -acid sequence shown 
is highly compatible with this structure. The presence of 

20 GLYjs, GLYm, and GLY^ give the turn and extended strand 
sufficient flexibility to accommodate any changes needed 
around CYS 41 to form the disulfide. 

From examination of this structure (as found in entry 
1SN3 of the Brookhaven Protein Data Bank) , we see that the 

25 following sets of residues would be preferred for variega- 
tion: 
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SET 1 
Residue 


Codon 


Allowed amino acids 


Naa/Ndna 




NNG 


L 1 R a MVS PTAQKEWG . 


13/15 


2) Ejg 


VHG 


LMVPTAGKE 


9/9 


3) A« 


VHG 


LMVPTAGKE 


9/9 


4) K» 


VHG 


LMVPTAGKE 


9/9 


5) G M 


NNG 


L 1 R 3 MVS PTAQKEWG . 


13/15 


6) Ejj 


VHG 


LMVPTAGKE 


9/9 


7) Q* 


VAS 


HQNKED 


6/6 



10 

Note: Exponents on amino acids indicate multiplicity of 
codons • 

Positions 27, 28, 31, 32, 24, and 23 comprise one face 
of the helix. At each of these locations we have picked a 

15 variegating codon that a) includes the parental amino acid, 
b) includes a set of residues having a predominance of helix 
favoring residues, c) provides for a wide variety of amino 
acids, and d) leads to as even a distribution as possible. 
Position 34 is part of a turn. The side group of residue 34 

20 could interact with molecules that contact the side groups 
of resideus 27, 28, 31, 32, 24, and 23. Thus we allow 
variegation here and provide amino acids that are compatible 
with turns. The variegation shown leads to 6.65-10 6 amino 
acid sequences encoded by 8.85-10 6 DNA sequences. 



25 


SET 2 

Residue 


Codon 


Allowed amino acids 


Naa/Ndna 




1) Dad 


VHS 


L 2 3MV* P *T a A a HQNKDE 


13/18 




2) T„ 


NNG 


L 2 R*MVSPTAQKEWG. 


13/15 




3) Ks, 


VHG 


KEQPTALMV 


9/9 


30 


4) A* 


VHG 


KEQPTALMV 


9/9 




5) K 32 


VHG 


LMVPTAGKE 


9/9 




6) S„ 


RRT 


SNDG 


4/4 




7) Y3, 


NHT 


YSFHPLNTIDAV 


9/9 
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Positions 26, 27, 30, 31, and 32 are variegated so a 
to enhance helix- favoring amino acids in the population. 
Residues 37 and 38 are in the return strand so that we pick 
different variegation codons. This variegation allows 
5 4.43 -10 6 amino-acid sequences and 7.08-10 6 DNA sequences. 
Thus a library that embodies this scheme can be sampled very 
efficiently. 

EXAMPLE III 

10 DESIGN AND MUTAGENESIS OF CLASS 3 MICRO - PROTEIN 

Two Disulfide Bond Pa rental Micro -Proteins 

Micro -proteins with two disulfide bonds may be modelled 
after the a-conotoxins, e.g. . GI, GIA, GII, MI, and SI. 
These have the following conserved structure: 

15 

12 l 1 2' 

(1-2 AAs)-C-C-(3 AAs) -C-{5 AAs) -C- (0-5 AAs) 



Hashimoto et al . (HASH85) reported synthesis of twenty- 
four analogues of or conotoxins GI, GII, and MI. Using the 
numbering scheme for GI (CYS at positions 2, 3, 7, and 13) , 

25 Hashimoto fit ftl- reported alterations at 4, 8, 10, and 12 
that allows the proteins to be toxic. Almquist fit flit 
(ALMQ89) synthesized [des-GLU,] a Conotoxin GI and twenty 
analogues. They found that substituting GLY for PR0 5 gave 
rise to two isomers, perhaps related to different disulfide 

30 bonding. They found a number of substitutions at residues 

8 through 11 that allowed the protein to be toxic. Zafar- 
alla fit aLt. (ZAFA88) found that substituting PRO at position 

9 gives an active protein. Each of the groups cited used 
only ia vivo toxicity as an assay for the activity. From 

35 such studies, one can infer that am active protein has the 
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parental 3D structure, but one can not infer that an 
inactive protein lacks the parental 3D structure* 

Pardi §t al. (PARD83) determined the 3D structure of a 
Conotoxin GI obtained from venom by NMR. Kobayashi al. 
5 (KOBA89) have reported a 3D structure of synthetic a 
Conotoxin 61 from NMR data which agrees with that of PARD89 . 
We refer to Figure 5 of Pardi <g£ al. . 

Residue GLU, is known to accomodate GLU, ARG, and ILE 
in known analogues or homologues. A preferred variegation 
10 codon is NNG that allows the set of amino acids [L*R a MVSPTA- 
QKEWG<stop>] . From Figure 5 of Pardi ££ al. we see that the 
side group of GLU X projects into the same region as the 
strand comprising residues 9 through 12. Residues 2 and 3 
are cysteines and are not to be varied. The side group of 
15 residue 4 points away from residues 9 through 12; thus we 
defer varying this residue until a later round. PR0 5 may be 
needed to cause the correct disulfides to form; when GLY was 
substituted here the peptide folded into two forms , neither 
of which is toxic. It is allowed to vary PR0 5 , but not 
20 perferred in the first round. 

No substitutions at AU^ have been reported. A 
preferred variegation codon is RMS which gives rise to ALA, 
THR, LYS, and GLU (small hydrophobic, small hydrophilic, 
positive, and negative) . CYS 7 is not varied. We prefer to 
25 leave GLY S as is, although a homologous protein having ALA, 
is toxic. Homologous proteins having various amino acids at 
position 9 are toxic; thus, we use an NNT variegation codon 
which allows FS 2 YCLPHRITNVADG . We use NNT at positions 10, 
11, and 12 as well. At position 14, following the fourth 
30 CYS, we allow ALA, THR, LYS, or GLU (vd^ an RMG codon) . 
This variegation allows 1.053 *10 7 anino-acid sequences, 
encoded by 1 . 68 '10 7 DNA sequences . Libraries having 2.0- 10 7 , 
3.0-10 7 , and 5.0-10 7 independent transformants will, 
respectively, display -70%, -83%, and -95% of the allowed 
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sequences. Other variegations are also appropriate* 
Concerning a conotoxins, see, inter ali&# ALMQ89, CRUZ85, 
GRAY83, GRAY84, and PARD89. 

The parental micro-protein nay instead be one of the 
5 proteins designated "Hybrid-I" and ■Hybrid-II" by Pease fit 
al. (PEAS90) ; cf . Figure 4 of PEAS90. One preferred set of 
residues to vary for either protein consists of: 





Parental 
Amino acid 


Variegated 
Codon 


Allowed 
Amino acidB 


AA seqs/ 
DNA secra 


10 


A5 


RVT 


ADGTNS 


6/6 




P6 


VYT 


PTALIV 


6/6 




E7 


RRS 


EDNKSRG* 


7/8 




T8 


VHG 


TPALMVQKE 


9/9 




A9 


VHG 


ATPLMVQKE 


9/9 


15 


A10 


RMS 


AEKT 


4/4 




K12 


VHG 


KQETPALMV 


9/9 




Q16 


NNG 


L , R*S.WPQMTKVAEG 


13/15 



This provides 9.55*10* amino -acid sequences encoded by 
20 1.26-10 7 DNA sequences. A library comprising 5.0-10 7 
trans formants allows expression of 98.2% of all possible 
sequences. At each position, the parental amino acid is 
allowed. 

At position 5 we provide amino acids that are corapati- 
25 ble with a turn. At position 6 we allow ILE and VAL because 
they have branched ft carbons and make the chain ridged. At 
position 7 we allow ASP, ASN, and SER that often appear at 
the amino termini of helices. At positions 8 and 9 we allow 
several helix- favoring amino acids (ALA, LEU, MET, GLN, GLU, 
30 and LYS) that have differing charges and hydrophobicities 
because these are part of the helix proper. Position 10 is 
further around the edge of the helix, so we allow a smaller 
set (ALA, THR, LYS, and GLU) . This set not only includes 3 
helix- favoring amino acids plus THR that is well tolerated 
35 but also allows positive, negative, and neutral hydrophilic. 
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The side groups of 12 and 16 project into the same region as 
the residues already recited. At these positions we allow 
a wide variety of amino acids with a bias toward helix- 
favoring amino acids. 
5 The parental micro- protein may instead be a polypeptide 

composed of residues 9-24 and 31-40 of aprotinin and 
possessing two disulfides (Cys9-Cys22 and Cysl4-Cys38) . 
Such a polypeptide would have the same disulfide bond 
topology as a-conotoxin, and its two bridges would have 
10 spans of 12 and 17, respectively. 

Residues 23, 24 and 31 are variegated to encode the 
amino acid residue set CG,S,R,D,N,H,P,T,A] so that a 
sequence that favors a turn of the necessary geometry is 
found, we use trypsin or anhydrotrypsin as the affinity 
15 molucule to enrich for GPs that display a micro-protein that 
folds into a stable structure similar to BPTI in the Pi 
region. 

Three Pjpulfj^ Bond Parental *H r™. Pr ?r ^ rFr 

The cone snails [Ssmm) produce venoms (conotoxins) 
which are 10-30 amino acids in length and exceptionally rich 
in disulfide bonds. They are therefore archetypal micro- 
proteins. Novel micro-proteins with three disulfide bonds 
may be modelled after the p- (GIIIA, GIIIB, GII1C) or 
Q-(GVIA, GVIB, GVIC, GVIIA, GVIIB, MVIIA, MVIIB, etc.) 
25 conotoxins. The ji-conotoxins have the following conserved 
structure: 



20 



30 



12 3 i< 2 '3 1 

(2 AAs)-C-C-C5 AAs)-C-<4 AAs>-C- (4 AAs) -C-C-AA 

1 I M 



35 



No 3D structure of a /t-conotoxin has been published. 
Hidaka sfc aJU (HIDA90) have established the connectivity of 
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the disulfides. The following diagram depicts geographu- 
toxin I (also known as /x-cono toxin GIIIA) . 



Rl 



10 



15 



20 



D2 



T5 



\ /K16 — P17 

C3::C15 \ 
I \ 018 

I \ -R19 ' 

C4::C20- \ 

\ 



P6 



Q14 

/ I 
P7 C10::C21 R13 

I / | L A22 I 

I / I / 
K8-K9 Kll D12 



The connection from R19 to C20 could go over or under the 
strand from Q14 to C15. One preferred form of variegation 

25 is to vary the residues in one loop. Because the longest 
loop contains only five amino acids , it is appropriate to 
also vary the residues connected to the cysteines that form 
the loop. Por example, we might vary residues 5 through 9 
plus 2, 11, 19, and 22. Another useful variegation would be 

30 to vary residues 11-14 and 16-19, each through eight amino 

acids. Concerning fi conotoxins, see BECK89b, BECK89c, 

CRUZ 8 9 , and HIDA90. 

The D - conotoxins may be represented as follows: 

1 2 3 1' 2' 3» 

35 C- (6 AAs) -C- (6 AAs) -C-C- (2-3 AAs) -C- (4-6 AAs) -C 



40 



The King Kong peptide has the same disulfide arrangement as 
the Q- conotoxins but a different biological activity. 
Woodward st al. (WOOD90) report the sequences of three 
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homologuous proteins from £*. £flastilA. Within the mature 
toxin domain, only the cysteines are conserved. The spacing 
of the cysteines is exactly conserved, but no other position 
has the same amino acid in all three sequences and only a 
few positions show even pair- wise matches. Thus we conclude 
that all positions (except the cysteines) may be substituted 
freely with a high probability that a stable disulfide 
structure will form. Concerning Q conotoxins, see HILL89 
and SDNX87. 

Another micro-protein which may be used as a parental 
binding domain is the Cucurbit* m^im* trypsin inhibitor I 
(CMTI-I) ; CMTI-III is also appropriate. They are members of 
the squash family of serine protease inhibitors, which also 
includes inhibitors from summer squash, zucchini, and 
cucumbers (WIEC85) . McWherter s£ SL, (MCWH89) describe 
synthetic sequence -variants of the squash- seed protease 
inhibitors that have affinity for human leukocyte elastase 
and cathepsin G. Of course, any member of this family might 
be used. 

CMTI-I is one of the smallest proteins known, compris- 
ing only 29 amino acids held in a fixed conformation by 
three disulfide bonds. The structure has been studied by 
Bode and colleagues using both X-ray diffraction (BODE89) 
and NMR (HOIA89a,b) . CMTI-I is of ellipsoidal shape; it 
lacks helices or 0- sheets, but consists of turns and 
connecting short polypeptide stretches. The disulfide 
pairing is Cys3-Cys20, Cysl0-Cys22 and Cysl6-Cys28. In the 
CMTI-I: trypsin complex studied by Bode ££ al. . 13 of the 29 
inhibitor residues are in direct contact with trypsin; most 
of them are in the primary binding segment Val2 (P4) -Glu9 
(P4 1 ) which contains the reactive site bond ArgS (PI) -Ile6 
and is in a conformation observed also for other serine 
proteinase inhibitors. 
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CMTI-I has a Kj for trypsin of ~1.5*10" 12 M. McWherter 
et al. suggested substitution of "moderately bulky hydropho- 
bic groups" at PI to confer HLB specificity. They found 
that a wider set of residues (VAL, ILE, LEU, ALA, PHE, MET, 
5 and GLY) gave detectable binding to HLE. For cathepsin G, 
they expected bulky (especially aromatic) side groups to be 
strongly preferred. They found that PHE, LEU,. MET, and ALA 
were functional by their criteria; they did not test TRP, 
TYR, or HIS. (Note that ALA has the second smallest side 

10 group available.) 

A preferred initial variegation strategy would be to 
vary some or all of the residues ARGi, VAI*2, PR0 4 , ARG 3 , ILE 6 , 
LEU,, MET,, GLU 9 , LYS n , HIS^, GLY^, TYR^, and GLY^. If the 
target were HNE, for example, one could synthesize DNA 

15 embodying the following possibilities: 



Parental 


vg 
Codon 


Allowed 
amino acids 


#AA seqs/ 

#DNA seers 


ARGj 


VNT 


RSLPHITNVADG 


12/12 


VALj 


NWT 


VTLPYHND 


8/8 


PR0 4 


VYT 


PLTIAV 


6/6 


-AR6 5 


VNT 


RSLPHITNVADG 


12/12 


ILE6 


NNK 


all 20 


20/31 


LEU, 


VWG 


LQMKVE 


6/6 




NAS 


YHQNKDE . 


7/8 



25 

This allows about 5.81-10 6 amino-acid sequences encoded by 
about 1.03-10 7 DNA sequences. A library comprising 5.0-10 7 
independent transf ormants would give -99% of the possible 
sequences. Other variegation schemes could also be used. 

30 Other inhibitors of this family include: 

Trypsin inhibitor I from Citrullus vulgaris (OTLE87) , 
Trypsin inhibitor II from Bryonia dioica (OTLE87) , 
Trypsin inhibitor I from Cucurbit a maxima (in OTLE87) , 
trypsin inhibitor III from Cucurbita maxima (iii OTLE87) , 

35 trypsin inhibitor IV from Cucurbita maxima (in OTLE87) , 
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trypsin inhibitor II from Cucurbita pepo (in OTLE87) , 
trypsin inhibitor III from Cucurbita pepo (in OTLE87) , 
trypsin inhibitor lib from Cucumis sativus (in 0TLE87) , 
trypsin i nhib itor IV from Cucumia sativus (in 0TLE87) , 
5 trypsin inhibitor II from Ecballium elaterium (FAVE89) , and 
inhibitor CM-1 from Momordica repens (in 0TLE87) . 

Another micro -protein that may be used as an initial 
potential binding domain is the heat -stable enterotoxins 
derived from some enterotoxogenic JL. coli . Citrobacter 

10 frsiffisiii, and other bacteria (GUAR89) . These micro-proteins 
are known to be secreted from £^ colj and are extremely 
stable. Works related to synthesis, cloning, expression and 
properties of these proteins include: EHA.TS 6 , SEKI85, 
SHIM87, TAKA85, .TAKE90, THOM85a,b f YOSH85, DALL90, DWAR89 , 

15 GARI87, 6UZM89, GUZM90, H0UG84, K0B089, KUPE90, OKAM87, 
OKAM88, and 0KAM90. 

EXAMPLE IV 

A MINI -PROTEIN HAVING A CROSS-LINK CONSISTING OF CU(II) , ONE 
20 CYSTEINE, TWO HISTIDINES, AND ONE METHIONINE. 

Sequences such as 
HIS»ASN-GLY-MET-Xaa-Xaa-Xaa-Xaa-Xaa-Xaa-HIS-ASN-GLY-Cys and 
CYS-ASN-GLY-MET-Xaa-Xaa-Xaa-Xaa-Xaa-Xaa-HIS-ASN-GLY-HISare 
likely to combine with Cu(Il) to form structures as shown in 
25 the diagram: 
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10 



15 



Xaa7- 

/ 

Xaa6 
I 

Xaa5 
\ 

MET4 
/ \ 



/ 



-Xaa8 
\ 

Xaa9 
I 

XaalO 
I 

HIS11 
\ 

\ 

ASN12 



/ \ / 

GLY3 Cu 

I / \ I 

ASN2-HIS1 CYS14-GLY13 

I I 
NHj COO 



Xaa7- 

I 

Xaa6 
I 

Xaa5 
\ 

MET4 
/ \ 



/ 



/ \ / 
GLY3 Cu 

I / \ 

ASN2— CYS1 HIS14-GLY13 

I I 

NHj COO 



-Xaa8 
\ 

Xaa9 
I 

XaalO 
/ 

HIS11 
\ 

\ 

ASN12 
I 



Other arrangements of HIS, MET, HIS, and CYS along the chain 
are also likely to form similar structures. The amino acids 

20 ASN-GLY at positions 2 and 3 and at positions 12 and 13 give 
the amino acids that carry the metal -binding Uganda enough 
flexibility for them to come together and bind the metal . 
Other connecting sequences may be used, e.g. GLY-ASN, SER- 
GLY, GLY-PRO, GLY-PRO-GLY, or PRO -GLY-ASN could be used. It 

25 is also possible to vary one or more residues in the loops 
-that join the first and second or the third and fourth 
metal-binding residues. For example, 



30 



35 



40 



Xaa8- 

/ 

Xaa7 
I 

Xaa6 
\ 

I MET5 

Xaa4 \ 



-Xaa9 
\ 

XaalO 
I 

Xaall 
I 

HIS12 
/ \ 



\ / \ 
PR03 Cu ASN13 

\ / \ | 

GLY2-HIS1 CYS15-GLY14 

I I 
NHj COO 
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is likely to form the diagrammed structure for a wide 
variety of amino acids at Xaa4. It is expected that the 
side groups of Xaa4 and Xaa6 will be close together and on 
the surface of the mini -protein. 
5 The variable amino acids are held so that they have 

limited flexibility. This cross-linkage has some differ- 
ences from the disulfide linkage. The separation between 
and C^i is greater than the separation of the Cjs of a 
cystine. In addition, the interaction of residues 1 through 

10 4 and 11 through 14 with the metal ion are expected to li m it 
the motion of residues 5 through 10 more than a disulfide 
between rsidues 4 and 11. A single disulfide bond exerts 
strong distance constrains on the a carbons of the joined 
residues, but very little directional constraint on, for 

15 example, the vector from N to C in the main- chain. 

For the desired sequence, the side groups of residues 
5 through 10 can form specific interactions with the target. 
Other numbers of variable amino acids, for example, 4, 5, 7, 
or 3, are appropriate. Larger spans may be used when the 

20 enclosed sequence contains segments having a high potential 
to form a helices or other secondary structure that limits 
the conformational freedom of the polypeptide main chain. 
Whereas a mini -protein having four CYSs could form three 
distinct pairings, a mini-protein having two HISs, one MET, 

25 and one CYS can form only two distinct complexes with Cu. 
These two structures are related by mirror symmetry through 
the Cu. Because the two HISs are distinguishable, the 
structures are different. 

When such metal -containing mini -proteins are displayed 

30 on filamentous phage, the cells that produce the phage can 
be grown in the presence of the appropriate metal ion, or 
the phage can be exposed to the metal only after they are 
separated from the cells. 
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EXAMPLE V 

A MINI -PROTEIN HAVING A CROSS -LINK CONSISTING OF ZN(II) AND 
FOUR CYSTEINES 

A cross link similar to the one shown in Example XV is 
5 exemplified by the Zinc- finger proteins (GIBS88, GAUS87, 
PARR88, FRAN87, CHOW87, HARD90) . One family of Zinc-fingers 
has two CYS and two HIS residues in conserved positions that 
bind Zn ++ (PARR88, FRAN87, CHOW87, EVAN88, BERGS 8, CHAV88) . 
Gibson £& al. (GIBS88) review a number of sequences thought 

10 to form zinc- fingers and propose a three-dimensional model 
for these compounds. Most of these sequences have two CYS 
and two HIS residues in conserved positions, but some have 
three CYS and one HIS residue. Gauss ££, al . (GAUS87) also 
report a zinc- finger protein having three CYS and one HIS 

15 residues that bind zinc. Hard ££ al. (HARD90) report the 3D 
structure of a protein that comprises two zinc -fingers, each 
of which has four CYS residues. All of these zinc-binding 
proteins are stable in the reducing intracellular environ- 
ment. 

20 One preferred example of a CYS: :zinc cross linked mini- 

protein comprises residues 440 to 461 of the sequence shown 
in Figure 1 of HARD90. The resiudes 444 through 456 may be 
variegated. One such variegation is as follows: 
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Parental Allowed &AA / #DNA 



SER444 


SER, 


ALA 








2 / 


2 


ASP445 


ASP, 


ASN, 


GLU, 


LYS 




4 / 


4 


GLU446 


GLU, 


LYS, 


GLN 






3 / 


3 


ALA447 


ALA, 


THR, 


GLY, 


SER 




4 / 


4 


SER448 


SER, 


ALA 








2 / 


2 


GLY449 


GLY, 


SER, 


ASN, 


ASP 




4 / 


4 


CYS450 


CYS, 


PHE, 


ARG, 


LEU 




4 / 


4 


HIS451 


HIS, 


GLN, 


ASN, 


LYS, ASP, 


GLU 


6*/ 


6 






n ri j 


HIS 


T.CTT 




4 / 


4 


GLY453 


GLY, 


SER, 


ASN, 


ASP 




4 / 


4 


VAL454 


VAL, 


ALA, 


ASP, 


GLY, SER, 


ASN, 


THR, ILE 
















8 / 


8 


LEU455 


LEU, 


HIS, 


ASP, 


VAL 




4 / 


4 


THR45S 


TPR, 


XLE, 


ASN, 


SER 




4 V 


4 


This leads 


to 3. 


77*10 


7 DNA 


sequences 


that 


encode the same 



number of amino-acid sequences. A library having 1.0* 10 s 
independent transformants will display 93% of the allowed 
20 sequences; 2.0- 10 s independent transformants will display 
99.5% of allowed sequences. 
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Table 2: Preferred Outer-Surface Proteins 



Genetic 
Package 



Preferred 
Outer- Surface 



preference 

M13 



coat protein 



10 



gp III 



15 



a) exposed amino terminus, 
(gpVIII)b) predictable post- 

translational 

processing, 

c) numerous copies in 

virion, 

d) fusio n data available 



a) fusion data available. 

b) amino terminus exposed. 

c) working example 
available, : 



PhiX174 



20 



6 protein a) known to be on virion 

exterior, 
b) small enough that 

the G-ip bd gene can 
. replace H gene. 



25 EL coli 



LamB 



a) fusion data available, 

b) non-essential. 



OmpC 



30 



a) topological model 

b) non-essential; abundant 

OmpAa) topological model 

b) non-essential; abundant 

c) homologues in other genera 
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O m p F 

a) topological model 

b) non-essential; abundant 

PhoEa) topological model 
b) non-essential; abundant 
c> inducible 



JL_ subtilis CotC 



10 



15 



a) no post-translational 
spores processing , 

b) distinctive sdequence 

that causes protein to 
localize in spore coat, 

c) non-eggentifrlt 



Sams <=tg fpy CotC, 
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Table 10: Abundances obtained 
from various vgCodons 

A. Optimized fxS Codon, Restrained by [D] + [E] = [K] + [R] 



10 





T 


c 


A 


G 




1 


.26 


.18 


.26 


.30 


f 


2 


.22 


.16 


.40 


.22 


X 


3 


.5 


.0 


.0 


.5 


S 





7\ t m 


ADunaance 


7ATTI A 

— OClO 






n 




c 


2 86% 




n 
a-* 


£ 00% 


E 


6.00% 






• W V V 


G 

vJ 


6.60% 




w 


3 .60% 


I 


2 .86% 




K 


5 .20* 


L 


6.82% 




M 


2.86% 


N 


5.20% 




P 


2.88% 


Q 


3.60% 


20 


R 


6.82% 


S 


7,02* ptfaa 




T 


4.16% 


V 


6.60% 




W 


2,86% IfW 


Y 


5.20% 




stop 


5,20* 






25 


[DJ + 


[E] - [K] + [R] - 


.12 






ratio 


• Abun(W) /Abun(S) 


- 0.4074 




30 


i 


(l/ratio)' 


(ratio)'" 


stop- free 




i 


2.454 


.4074 


.9480 




2 


6.025 


.1660 


.8987 




3 


14.788 


.0676 


.8520 




4 


36.298 


.0275 


.8077 


35 


~5 


89.095 


.0112 


.7657 




6 


218.7 


4.57-10° 


.7258 




7 


536.8 


1.86-10° 


.6881 
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Table 10: Abundances obtained 
from various vgCodon 
(continued) 



B. Unrestrained, optimized 



10 





T 


C 


A 


<? 


1 


.27 


.19 


.27 


.27 


2 


.21 


.15 


.43 


.21 


3 


.5 


.0 


.0 


.5 



Amino 



Amino 



15 



20 



25 



agid 


Abundance 


Mid 


Abundanqe 


A 


4.05% 


C 


2.84% 


D 


5.81% 


E 


5.81% 


F 


2.84% 


6 


5.67% 


H 


4.08% 


I 


2.84% 


K 


5.81% 


L 


6.83% 




2 . 54% 


N 


5.«i% 


P 


2.85% 


Q 


4.08% 


R 


6.83% 


5 


6.89% mfaa 


T 


4.05% 


V 


5.67% 


W 


2,94% Ifaa 


Y 


5.81% 


stop 


5.81% 






[D] + 


fE] « 0.1162 [KJ 


+ [R3 


= 0.1264 



30 



35 



40 



ratio 


- Abun(W) /Abun(S) 


- 0.41176 




i 


(1/ratioV 


(ratio) J 


stop- free 


1 


2.4286 


.41176 


.9419 


2 


5.8981 


.16955 


.8872 


3 


14.3241 


.06981 


.8356 


4 


34.7875 


.02875 


.7871 


5 


84.4849 


.011836 


.74135 


6 


205.180 


. 004874 


.69828 


7 


498.3 


2.007-10^ 


.6577 
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Table 10: Abundances obtained 
from various vgCodon 
(continued) 

5 C. Optimized NNT 

T £ A g 

1 12071 ^2929 .2071 -2929 

2 .2929 .2071 .2929 .2071 



Amino 



15 



20 



25 



30 



35 



acid 


Abundance 


A 


6.06% 


D 


8.58% 


P 


6.06% 


H 


8.58% 


K 


none 


M 


none 


P 


6.06% 


R 


6.06% 


T 


4.29% lfaa 


W 


none 


Stop _ 


none 


i 


f-L/ratio)i 


i 


2.0 


2 


4.0 


3 


8.0 


4 


16.0 


5 


32.0 


6 


64.0 


7 


128.0 



Amino 
acid 



B 
6 
I 
L 
N 
Q 
S_ 



V 
Y 



Ab\md.fln,«e 
a. 29% lfaa 
none 
6.06% 
6.06% 
8.58% 
6.06% 
none 

ft .58% mfaa 



8.58% 
6.06% 



(ratio)' 
.5 
.25 
.125 
.0625 
.03125 
.015625 
.0078125 



Atop-free 
1. 
i. 
l. 
l. 
l. 
i. 
i. 
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Table 10: Abundances obtained 
from various vgCodon 
(continued) 

5 

D. Optimized NNG 







T 


c 


A 


G 




1 


.23 


.21 


.23 


.33 


10 


2 


.215 


.285 


.285 


.215 




3 


.0 


.0 


.0 


1.0 



15 



25 



Amino 
acid 


Abundance 


Amino 
acid 


Abundance 


A 


9.40% 


C 


none 


D 


none 


E 


9.40% 


P 


none 


6 


7.10% 


H 


none 


I 


none 


K 


• 6.60% 


L 


9.50% mfaa 


M 


4.90% 


N 


none 


P 


6.00% 


Q 


6.00% 


R 


9.50% 


S 


6.60% 


T 


6.6 % 


V 


7.10% 


W 


4 T ?0% Ifaa 


Y 


none 


stop 


6.60% 






i 


(l/ratio}i 


(ratio)i 


ston-free 


i 


1.9388 


.51579 


0.934 


2 


3.7588 


.26604 


0.8723 


3 


7.2876 


.13722 


0.8148 


4 


14.1289 


.07078 


0.7610 


5 


27.3929 


3.65-10' 2 


0.7108 


6 


53.109 


1.88-10' 2 


0.6639 


7 


102.96 


9.72-10^ 


0.6200 
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Table 10: Abundances obtained 
from optimum vgCodon 
(continued) 

5 

E. Unoptimized NNS (NNK gives identical distribution) 





T 


C 


A. 


6 


1 


.25 


.25 


.25 


.25 


2 


.25 


.25 


.25 


.25 


3 


.0 


.5 


.0 


0.5 



30 



Amino 
wW 




Amino 
acid Abundance 


A 


6.25% 


C 


3.125% 


D 


3.125% 


E 


3.125% 


F 


3.125% 


6 


6.25% 


H 


3.125% 


I 


3.125% 


K 


3.125% 


L 


9.375% 


M 


3.125% 


N 


3.125% 


P 


6.25% 


Q 


3.125% 


R 


9.375% 


S 


9.375% 


T 


6.25% 


V 


6.25% 


W 


3.125% 


Y 


3.125% 


stop 


3.125% 






i 


(l/ratio) j 




stop-firee 


i 


3.0 


.33333 


.96875 


2 


9.0 


.11111 


.9385 


3 


27.0 


.03704 


.90915 


4 


81.0 


.01234567 


.8807 


5 


243.0 


.0041152 


.8532 


6 


729.0 


1.37'10-* 


.82655 


7 


2187.0 


4.57-10" 4 


.8007 
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Table 130: Sampling of a Library encoded by (NNK)' 

A . Numbers of hexapeptides in each class 

total - 64,000,000 stop-free sequences. 

a can be one of [WMFYCIKDENHQ] 
$ can be one of [PTAVG] 
Q can be one of [SLR] 

aaaaaot 
Qaaaaa 
*Qaaaa 

*QQaaa 
***Saa! 
**QQaa 
QQQQaa 
****Qa 
$*QQQa 
QQQQQa 
HW(Q 

«QQQQQ 

**Q0aa. for example, stands for the set of peptides having 
So aMno ac!Ssf rom the « class, two from *, and two from 
Q arranged in any order. There are, for example, 729 = 3 
sequences composed entirely of S, L, and R. 



s 


2985984. 


«aaaaa 




7464960. 




4478976. 


Maaaa 




7776000. 




9331200. 


QQaaaa 




2799360. 




4320000. 


**Qaaa 


tst 


7776000. 


ct 


4665600. 


QQQaaa 




933120. 




1350000. 




ts 


3240000. 




2916000. 


OQQQaa 


mm, 


1166400. 




174960. 






225000. 




675000. 




BS 


810000. 




486000. 


*QQQQa 




145800. 


SB 


17496. 




SS 


15625 . 




56250. 






84375. 




67500. 


♦4K2QQQ 


V 


30375. 




7290. 


QQQQQQ 


8 


729. 
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Table 130: Sampling of a Library encoded by (NNK) 6 

(continued) 

Probability that any given a top- free DNA 
sequence will encode a hexapeptide from a 
stated class. 



10 



15 



20 



25 



30 



35 



ttouxotot. . 
*Oofaaa. - 
QQotocciot. . 



$QQQaa. . 
QQQQaa* . 

$>$>$QQa. . < 

$QQQ0a. . , 
QQQQDa. . . 

. . 

$$$QQQ. . . 
GQQQQQ # . . 



P 

3.364E-03 

1.682B-02 

1.514E-02 

3.505E-02 

6.308E-02 

2.839E-02 

3.894E-02 

1.051E-01 

9.463E-02 

2.839E-02 

2 .434E-02 

8.762E-02 

1.183E-01 

7.097E-02 

1 .5978-02 

8.113E-03 

3.651E-02 

6-571E-02 

5..914B-02 

2.661E-02 

4.790E-03 

1.127E-03 

6.084E-03 

1.369E-02 

1.643E-02 

1.109E-02 

3.992E-03 

5.988E-04 



* of class 

(1.13E-07) 

(2.25E-07) 

(3.38E-07) 

(4.51E-07) 

(6.76E-07) 

(1.01E-06) 

(9.01E-07) 

(1.35E-06) 

(2.03E-06) 

(3-04E-06) 

{i.80B-06) 

(2.70E-06) 

(4.06E-06) 

(6.08E-06) 

(9.13E-06) 

(3.61E-06) 

(5.41E-06) 

(8.11E-06) 

(1.22E-05) 

(1.83E-05) 

(2 .74E- 05) 

(7-21E-06) 

(1.08E-05) 

(1.62E-05) 

(2.43E-05) 

(3.65E-05) 

(5.48E-05) 

(8.21E-05) 
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library sizes 



Library 

total = 
.Class 

OtOLUOtOLCt. . 

Qoeaoeota . * 
SQaaaot. . 
**»aaa. . 
$QQaaof. ■ 

**QQaoE. 
QQQQofOf. 

**QQQ0£. 
QQQQQa. 

$$$QQQ- 
$QQQQQ. 



size 



1.0000E+06 



9.7446E+05 % sampled 



1.52 



pumber Sl 



Timber _ 



3362. 6( 
15114.6 ( 
62B71.K 
38765. 7( 
93672. 7( 
24119.9 ( 
115915.5 ( 
15261.1 ( 
35537. 2( 
55684.4 ( 
4190.6 ( 
5767.0 ( 
14581. 7{ 
3073. 9( 



.1) 
.3) 
.7) 
.9) 
2.0) 
1.8) 
4.0) 
8.7) 
5.3) 
11.5) 
24.0) 
10.3) 
21.6) 
42.2) 



QCtOtOtOLOL • • . 

Mataoea. . . 
QQaaaa. . • 
$$Qaaa. • . 
QQQaaot — 
*$*Qckx. . . 
OQQQaa. . . 

*QQQQa. . . 
WW* • • 
$$$$QQ. . . 
<M>QQQO. . * 
QQQQQQ. - . 



Library size - 3.0000E+06 
total - 2.7885B+06 % sampled - 



aaaaace. . . 
Qaaotaa. . . 
4Qaaaa. . . 
444aaa. . • 
4QQaaa. . 
444*aa. . 
44QQaa. . 
DQODaaf. . 
HHQtt. . 
44QQQa. . 
QQQQQa. . 
4444*Q. . 
444QQQ . . 
4QQQQQ. • 



10076.4 ( 
45190.9 ( 
187345.5 ( 
115256.6 ( 
275413.9 ( 
71074.5 ( 
334106.2 ( 
41905.9 ( 
101097.3 ( 
148643.7 ( 
9801. 0( 
15587.7 ( 
34975.6 ( 
5879.9 ( 



.3) 
1.0) 
2.0) 
2.7) 
5.9) 
5.3) 
11.5) 
24.0) 
15.0) 
30.6) 
56.0) 
27.7) 
51.8) 
80.7) 



16803. 4 ( 
34967.8 ( 
28244.3 ( 
104432.2 ( 
27960.3 ( 
86442.5 ( 
68853. 5( 
7968. K 
63117.5 ( 
24325.9 ( 
1087.1 ( 
12637.2 ( 
9290. 2( 
408. 4( 



.2) 
.4) 
1.0) 
1.3) 
3.0) 
2.7) 
5.9) 
3.5) 
7.8) 
16.7) 
7.0) 
15.0) 
30.6) 
56.0) 



4.36 



4aaraaa. . 
Maaaa. . 
QQaaaa. . 
4*Qaaa . . 
QQQaaa. . 
*44Qaa. . 
4QQQaa. . 
44444a. . 
444QQa. . 
4QQQQa. 

4444QQ- 
44QQQQ* 
QQQQQQ. 



50296.9 ( 
104432.2 ( 
83880.9 ( 
309107.9 ( 
81392.5 ( 
252470.2 { 
194606.9 ( 
23067. 8( 
174981.0 ( 
61478.9 ( 
3039. 6( 
32516 . 8 ( 
20215.5 ( 
667. 0( 



.7) 
1.3) 
3.0) 
4.0) 
8.7) 
7.8) 
16.7) 
10.3) 
21.6) 
42.2) 
19.5) 
38.5) 
66.6) 
91.5) 
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~a. 130= •^U^ ( « £i5£gr enccea * 



Library size 



l.OOOOE+07 



total *= 8.1204E+06 % sailed . 



12.69 



$Q0iC£<XCt. . 

OOOQQq;. . . 

***QQC2 

SQQQQia. . . 



33455.9 ( 
148871.1 ( 
609987.6 ( 
372371. 8 { 
856471. 6 ( 
222702.0 ( 
972324.6 { 
104722.3 ( 
281976.3 ( 
342072.1 ( 
16364.0 ( 
37179 . y ( 

61580.0 ( 
7259.5 ( 



1-1> 
3.3) 
6.5) 
8.6) 
18.4) 
16.5) 
33.3) 
59.9) 
41.8) 
70.4) 
93.5) 
66.1) 
91.2) 
99.6) 



*araarara. . , 
**aaraa. . , 
OQaaaa. . . 
**Qorara. . . 
BQQaraa. . . 
#**Qaa. . . 
*OOQaa. . . 
**$**a. . . 

***OOa 

*QDQQo!. . . 
******. . . 
****QQ. . . 
«*QQOQ. . . 
QQQOQQ. . . 



166342 
342685 
269958 
983416 
244761 
767692 . 
531651. 

68111. 
450120. 
122302. 



• 4( 

• 7( 
-3( 

• 4( 

• Si 

• 5( 

• 3( 
■ 0( 

2{ 
6( 



2.2) 
4.4) 
9.6) 
12.6) 
26.2) 
23.7) 
45.6) 
30.3) 
55.6) 
83.9) 



Library size = 



67719. 
29586. 
728. 



A \ 



5( 80.3) 
K 97.4) 
8 (100.0) 



3.0000E+07 



total 

oroaaoa. 
Qactoaxa. 
$0ataae<x. 

*COaaa. , 

QQQQaa. . 
****0a. . 
**QD0a. . 
QQQQQa. . 

3QQQQQ. . 



1.8633E+07 % sampled - 29. 11 



99247 
431933 
1712943 
1023590 
2126605 
563952 
2052433. 
163640. 
541755. 
473377. 
17491. 
54058. 
67454. 
7290. 



• 4( 

• 3( 

• 0( 

• 0{ 
•0( 

• 6( 

• 0( 
.3( 

7( 
0( 



3.3) 
9.6) 
18.4) 
23.7) 
45.6) 
41.8) 
70.4) 
93.5) 
80.3) 
97.4) 
3(100.0) 
K 96.1) 
5( 99.9) 
0(100.0) 



♦aordfora. . 
**aaofa. . , 
OQaaaa. . . 

**Qaaa. . . 
DQQaaa. . . 
***Qaa. . . 
*QOQaa. . . 
****$a. . . 
***OQa. . . 
*QQQOa. . . 
. . 

**QQQQ. . . 
QQQOQQ . . . 



487990 
983416 
734284 
2592866 
558519 
1800481 
978420 
148719. 
738960. 
145189 . 
13829. 
83726. 
30374 
729 



• 0( 

• 5( 

• 6( 

• 0( 

• 0( 

• 0( 
-5( 

• 7( 
K 
7( 
K 
01 



6.5) 
12.6) 
26.2) 
33.3) 
59.9) 
55.6) 
83.9) 
66.1) 
91.2) 
99.6) 
88.5) 
99.2) 
5(100.0) 
0(100.0) 
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Table 130: Sampling of a Library encoded by (NNK) 6 

(continued) 



10 



15 



20 



25 



30 



35 



40 



Library size 



7.6000E+07 



total « 

aaaaaa. . . 
Qaaaaa. . . 
*Qaaaa. . . 
#**aaa. . . 
4>QQaaa. . . 
*#**aa. . . 
<M>QQaa. . . 
QQQQaa. . . 
****Qa. . . 

**QQQa 

QQQQQa. . . 
***«*Q. . . 
***QQQ. . . 
*QQQQQ. . . 



3.2125E+07 % sampled » 50.19 



245057 
1014733 
3749112 
2142478 
3666785 
1007002 
2782358 
174790 
663929 
485953 
17496 
56234 
67500 
7290 



Library size - 



8( 8.2) 
0( 22.7) 
0( 40.2) 
0( 49.6) 
0( 78.6) 
,0( 74.6) 
0( 95.4) 
0( 99.9) 
,3( 98.4) 
,2(100.0) 
.0(100.0) 
.9(100.0) 
,0(100.0) 
.0(100.0) 

1.0000E+08 



♦aaaaa. . . 
**aaaa. . . 
QQaaaa. . . 
**Oaaa. . . 
QQQaaa. . . 
»**Qaa. . . 
GQQQaa. . . 
MMto. . . 
***QQa. . . 
*QQQQa. . . 

. . 

****QQ. . . 
**QQQQ. . . 
QQQQQQ. 



1175010. 
2255280. 
1504128. 
4993247. 

840691. 
2825063. 
1154956. 
210475. 
808298. 
145799. 
15559. 
84374. 
30375. 
729. 



0( 
0( 
0( 
0( 
9( 
0( 
0( 
6( 
6( 



15.7) 
29.0) 
53.7) 
64.2) 
90.1) 
87.2) 
99.0) 
93.5) 
99.8) 
9(100.0) 
9( 99.6) 
6(100.0) 
0(100.0) 
0(100.0) 



total - 3.6537E+07 % sampled - 57.09 



aaaaaa . . . 
Qaaaaa. . . 
*0aaaa. . . 
4>**aaa. . . 
*QQaaa. . . 
****aa. . . 
**QQaa. . . 
QQOQaa. . . 
****Da. . . 
**QQQa. . . 
QQQQOa. . . 
44$44Q ... 
***QQQ. . . 
400QQQ. . . 



318185 
1284677 
4585163 
2566085 
4051713 
1127473 
2865517 
174941 
671976 
485997 
17496 
56248 
67500 
7290 



.K 
.0( 
.0( 
.0( 
.0( 
.0( 
.0( 



10.7) 
28.7) 
49.1) 
59.4) 
86.8) 
83.5) 
98.3) 
.0(100.0) 
.9( 99.6) 
.5(100.0) 
.0(100.0) 
.9(100.0) 
.0(100.0) 
.0(100.0) 



$aaaaa. . . 
**aaaa. . . 
ODaaaa. . . 
**Qaaa. . . 
QQQaaa. . . 

***Daa 

*QQQaa. . . 
****«a. . . 
***QQa. . . 
♦QQQOa. . . 

. . 

**QQQQ. . . 
QQQQQQ . . . 



1506161 
2821285 
1783932 
5764391 
888584 
3023170 
1163743 
218886 
809757 
145800 
15613 
84375 
30375 
729 



0( 
0( 
0( 

,0( 
,3( 
.0( 
.0( 
.6( 



20.2) 
36.3) 
63.7) 
74.1) 
95.2) 
93.3) 
99.8) 
97.3) 
.3(100.0) 
.0(100.0) 
.5( 99.9) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
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Table 130: Sampling of a Library encoded by (NNK) 6 

(continued) 



Library size 



3.0000E+08 



total = 5.2634E+07 % sampled » 



82.24 



Gorororara. 



856451 
2854291 
8103426 
4030893 
4654972 
1343954 
2915985. 
174960* 
674999. 
486000. 

X / «D. 

56250. 
67500 
7290 



3( 

• 0{ 

• 0{ 

• 0( 
-0( 

• 0( 



28.7) 
63.7) 
86.8) 
93.3) 
99.8) 
99.6) 
0(100.0) 
0(100.0) 
9(100.0) 
0(100.0) 
0 (j.00.0) 
0(100.0) 
0(100.0) 
0(100.0) 



$$aaaa. . 
QQaaara. . . 
**Qaaa. . . 
QQQaaa. . . 
***Qaa. . . 
*QQQaa. . . 
*4***a. . . 
***QQa. . . 
*QQQ0a. . . 

. . 

#*$*QQ. . . 
#*QQQQ. . . 
QQQQQQ ... 



3668130 
5764391 
2665753 
7641378 
933018 
3239029 
1166400. 
224995. 
810000. 
145800. 
15625. 
84375. 
30375. 
729. 



0( 49.1) 
0( 74.1) 
•0( 95.2) 
.0( 98.3) 
.6(100.0) 
.0(100.0) 
.0(100.0) 
.5(100.0) 
.0(100.0) 
.0 (100. n) 

0(100.6) 
0(100.0) 
0(100.0) 
0(100.0) 



25 



30 



35 



40 



Library size = 



1.0000E+09 



total = 6.1999B+07 % Bampled - 96.87 



aaaaaa. . 
Qoeraraa. . 
QQacaaa. . 
***aora. . 
$QQaaa. . 
****ora. . 
**QQaa. . 
□QQQora. . . 
****Qa. . . 
*$QQQa. . . 
QQQQQa. . . 

. . 

*$*QQQ. . . 
*QQQQQ. . . 



2018278 
4326519 
9320389 
4319475. 
4665600. 
1350000. 
2916000. 
174960. 
675000. 
486000. 
17496. 
56250. 
67500. 
7290 



•0( 67.6) 
• 0( 96.6) 
•0( 99.9) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
0(100.0) 
0(100.0) 
0(100.0) 
0(100.0) 
0(100.0) 
0(100.0) 
0(100.0) 
0(100.0) 



*aaaora 

**aaara! 

QQaaaa 

**Qaaar. . . 

QQQaaa 

***Qaa. . . 
SQQQaa. . . 
****$a. . . 
***Q0a. . . 
*QQQQa. . . 

. . 

**QQQQ. 
QQQQQQ . . . 



6680917 
7690221 
2799250 
7775990 
933120 
3240000 
1166400. 
225000. 
810000. 
145800. 
15625. 
84375. 
30375. 
729. 



.01 89.5) 
.0( 98.9) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
0(100.0) 
0(100.0) 
0(100.0) 
0(100.0) 
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Table 130: Sailing of a Library encoded by (NNK) 6 

(continued) 



10 



15 



20 



Library size = 



3.0000E+09 



total - 6.3890E+07 % sampled - 99.83 



aaaaaa. . . 
Qaaaaa. . . 
♦Qaaaa. . . 
#**aaa. . . 
$QQaaa. . . 
****aa. . . 
**QQaa. . . 
QQQQaa. . . 

4>*4>*nar 

**QQQa. . . 
QQQQQa. . . 
4>****Q. . . 
***QQB. . . 
4-QQQQQ. . . 



2884346, 
4478800, 
9331200 
4320000 
4665600 
1350000 
2916000 
174960 
675000 
486000 
17496 
56250 
67500 
7290 



,0( 96.6) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 



*aaaaa . 
$4>aaaa. 
QQaaaa. 
99Q<xaa . 
QQQaaa. 
***Qaa. 
*QQQaa. 
HWa. 
***QQa. 
4>QQQOa . 

4>4>QQQD. 
QQQQQQ. 



. 7456311. 
. 7775990. 
. 2799360. 
. 7776000. 
,. 933120. 
. . 3240000. 
.. 1166400. 
. . 225000. 
. . 810000. 
145800. 
15625. 
84375. 
30375. 
729 . 



0( 99.9) 
0(100.0) 
0(100.0) 
0(100.0) 
0(100.0) 
0(100.0) 
0(100.0) 
0(100.0) 
0(100.0) 
0(100.0) 
0(100.0) 
0(100.0) 
0(100.0) 
0(100.0) 
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Table 130, continued 

D. Formulae for tabulated quantities. 

5 Lsize is the number of independent transf ormants . 
31**6 is 31 to sixth power; 6*3 means 6 times 3, 
A = Lsize/ (31**6) 
a can be one of [WMFYCIKDENHQ . ] 
$ can be one of [PTAVGJ 
10 Q can be one of [SLR] 

F0 = (12>**6 Fl = (12)**5 F2 - (12)**4 

F3 - (12)**3 F4 = (12)**2 F5 - (12) 



15 



20 



25 



30 



35 



40 



45 



$aaaaa 
Qaaraaar 

QQQaraar 

$QQQarar 
QQQQaa 

$$QQGar i 
SQQOQar = 
QQQQQa ■ 

#*«2CQ = 
**QQQQ < 
*QQQQQ = 
QQQQQQ = 
total = 



= F0 * (l-exp(-A)) 

= 6 * 5 * Fl * (l-exp(-2*A) ) 

- 6 * 3 * Fl * (l-exp(-3*A)) 

= (15) * 5**2 * F2 * (l-exp(-4*A)> 

- vu"j/-j-j -r^ * vx-exp i-b x Aj ) 

= (15) * 3**2 * F2 * (l-exp(-9*A)) 
= (20)* (5**3) * F3 * (l-exp(-8*A) ) 

* (60)* (5*5*3) *F3* (l-exp(-12*A)) 
» (60)* (5*3*3) *F3*(l-exp(-18*A)) 

• (20)*(3)**3*F3*(l-exp(-27*A)) 
= (15)*(5)**4*F4*(l-exp(-16*A)) 

< (60)*(5)**3*3*F4*(l-exp(-24*A)) 

■ (90)* (5*5*3*3) *F4*(l-exp(-36*A)) 
> (60)* (5*3*3*3) *F4*(l-exp(-54*A)) 

■ (15)*(3)**4 * F4 *(l-exp(-81*A)) 

< (6)*(5)**5 * F5 * (l-exp(-32*A)) 
: 30*5*5*5*5*3*F5*(l-exp(-48*A) ) 

60*5*5*5*3*3*F5*(l-exp(-72*A)) 
60*5*5*3*3*3*F5*(l-exp(-108*A)) 
30*5*3*3*3*3*F5*(l-ejq>(-162*A)) 
6*3*3*3*3*3*F5*(l-exp(-243*A)) 
5**6 * (l-exp(-64*A)) 
6*3*5**5* (l-exp(-96*A)) 
15*3*3*5**4* (l-exp(-144*A) ) 
20*3**3*5**3* (1-exp ( -216*A) ) 
15*3 **4*5**2* (1-exp ( -324*A) ) 
6*3**5*5* (1-exp (-486*A) ) 
3**6* (1- exp(-729*A)) 
otaaotatoi + ^ctaaact + Qocaoiotoi 

CDQQDof + $$$$$$ + $$WQ 
$*QQQQ + SGQQQQ + QQOQQQ 



+ 

+ 
+ 



$$0t0L0ttt 
£QQCtCZCL 



$QQQQa + 
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Table 131: Sampling of a Library 
Encoded by (NNT) 4 (NNG) 2 

X can be F,S,Y,C,L,P,H,R,I,T,N,V,A,D,G 
T can be L a ,R a ,S,W,P,Q,M,T,K,V,A,E, G 

Library comprises 8.55 '10 6 amino- acid sequences; 1.47*10 7 DNA 
sequences . 

Total number of possible aa sequences'- 8,555,625 



x 
S 

e 
o 



LVPTARGFYCHIND 
S 

VPTAGWQMKES 
LR 



The first, second, fifth, and sixth positions 
20 can hold x or S; the third and fourth position can hold 8 or 
Q. I have lumped sequences by the number of xs, Ss, 6s, and 
Qs. 

For example xxGQSS stands for: 
25 [xxGQSS, xSGQxS, xSGQSx, SSGQxx, SxGQxS, 

SxOOSx, 

xxQGSS, xSQGxS, xSQ8Sx, SSQGxx, SxQGxS, SxQGSx] 

The following table shows the likelihood that 
30 any particular DNA sequence will fall into one of the 
defined classes. 



Library size 



35 



40 



total . . 
xx86xx. 
xxQQxx. 
xxGQxS. 

xxoess. 

xxQQSS. 

xseoss . 
sseess. 

SSQQSS. 



1.0 

l.OOOOE+00 
3.1524E-01 
4.1684E-02 
1.3101E-01 
3.8600B-02 
5.1042E-03 
2.6736E-03 
1.3129B-04 
1.7361E-05 



Sampling 



%sampled. 
xxOQxx. . . 
xxeexS. . . 
xxQQxS. . . 
xxGQSS . . . 

xseess. . . 

xSQQSS. . . 

sseoss... 



i .00001% 

1.1688E-07 
2.2926E-01 
1.8013E-01 
2.3819E-02 
2.8073E-02 
3.6762E-03 
4.8611E-04 
9.5486E-05 



45 
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Table 131: Sampling of a Library 
Encoded by (NNT) * (NNG) 2 
(continued) 

5 The following sections show how many sequences 

of each class are expected for libraries of different sizes. 



10 



15 



20 



25 



30 



35 



40 



Library size 



1.0000E+05 



total. 
Type 



xx86xx. 
XxQQxx. 
XxBQxS . 

xxeess . 

XxQQSS. 
XSQQSS. 
SS86SS. 
SSQQSS. 



9.9137E+04 
Number % 



31416.9 ( 
4112.4 ( 
12924.6 { 
3808. 1( 
483. 7( 
253. 4( 
±z .4 ( 
1.4 ( 



fraction sampled = 1.1587E-02 
Typ e Number % 



2 
2 
2 
10 
10 
xu 
35 



.7) 
.7) 

.7) 
.7) 
.3) 



xxBQxx. 
xx09xS . 
xxQQxS. 
xx6QSS . 

xseess. 

.3) xSQQSS. 
.3) kbbuss. 
2) 



22771.4 ( 
17891. 8( 
2318. 5( 
2732.5 ( 
357.8 { 



1.3) 
1.3) 
5.3) 
5.3) 
5.3) 



43. 7( 19.5) 
8.6( 19.5) 



Library size 



1.0000E+06 



total 

xxGOxx 304783.9 ( 

XxQQxx 36508.6 ( 

xx6QxS 114741.4 ( 



9.2064E+05 

. 6, 
23, 



fraction sampled = 1.0761E-01 



xxeess. 

xxQQSS. 

xseoss. 
sseess. 

SSQQSS. 



33807. 7( 
3114.6 ( 
1631.5 ( 
80. 1( 
3.9< 



23 
23 
66 
66 

66. 
98. 



6) 
8) 
8) 
8) 
2) 
2) 
2) 
7) 



xxGOxx 214394.0 ( 

xx99xS 168452.5 ( 



xxQQxS. 
xxQQSS. 

xseess. 

xSQQSS. 
SS6QSS. 



18383.8 ( 
21666. 6( 
2837. 3( 
198. 4( 
39. 0( 



12.7) 
12.7) 
41.9) 
41.9) 
41.9) 
88.6) 
88.6) 



Library size 



3.0000E+06 



total. 



2.3880E+06 - fraction sampled = 2.7912E-01 



xxeexx 855709. 5 ( 18.4) 

xxQQxx 85564.7 ( 55.7) 

xxBQxS 268917.8 ( 55.7) 

xxeeSS 79234.7 ( 55.7) 

xxCQSS 4522.6 ( 96.1) 

xSBQSS 2369.0 ( 96.1) 

SSeeSS 116.3( 96. 1) 

SSQOSS 4.0(100,0) 



xxeQxx 565051. 6 ( 33.4) 

xxeexS 443969. 1( 33.4) 



xxQQxS. 
xxOQSS . 

xseess. 

XSQQSS. 

sseoss. 



35281.3 ( 80.4) 
41581. 5( 80.4) 
5445. 2( 80.4) 
223. 7( 99.9) 
43. 9( 99.9) 
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Table 131: Sampling of a Library 
Encoded by (NNT) 4 (NNG) 2 
(continued) 

5 Library size = 8.5556E+06 

total 4.9303E+06 fraction sampled = 5.7626E-01 

xx96xx 2046301.0 ( 44.0) xx9Qxx 1160645.0 ( 68.7) 

xxQQxx 138575. 9( 90.2) xx96xS 911935. 6( 68.7) 

10 xxGQxS 435524.3 { 90.2) xxQQxS 

XX69SS 128324.1 ( 90.2) xx9QSS 

xxQQSS 4703.6(100.0) XS99SS 

XS9QSS 2463.8(100.0) xSQQSS 

sseess 121.0(100.0) sseoss 

15 SSQQSS 4.0(100.0) 



43480. 7( 99.0) 
51245.1 ( 99.0) 
6710. 7( 99.0) 
224.0(100.0) 
44.0(100.0) 



Library size = 



1.0000E+07 



20 



25 



total 5.3667E+06 

xxBBxx 2289093.0 ( 49. 

xxQQxx 143467.0 ( 93. 

xx9QxS 450896.3 ( 93. 

XX06SS 132853.4 ( 93. 

xxQQSS 4703.9(100. 

XS8QSS 2464.0(100. 

sseess . 121.0(100. 

SSQQSS 4.0(100. 



fraction sampled - 6.2727E-01 

2) xx9Qxx 1254877.0 ( 74.2) 

4) xx96xS 985974.9 ( 74.2) 

4) xxQQxS 43710. 7( 99.6) 

4) xx9QSS. ... . 51516.1 ( 99.6) 



0) 
0) 
0) 
0) 



xsaess. 

XSQQSS. 

sseoss. 



6746. 2( 99.6) 
224.0(100.0) 
44.0(100.0) 



30 



35 



Library size 



3.0000E+07 



total 7.8961E+06 fraction sampled - 9.2291E-01 

XX99XX 4040589. 0( 86.9) xx9Qxx 1661409. 0( 98.3) 

xxQQxx 153619.1(100.0) xx99xS 1305393. 0( 98.3) 



xxBQxS 482802.9(100.0) xxQQxS. 

xxeeSS 142254.4(100.0) xx9QSS. 

xxQQSS..... 4704.0(100.0) xS99SS. 

XS8QSS 2464.0(100.0) xSQQSS. 

sseess 121.0(100.0) sseoss. 

SSQQSS..... 4.0(100.0) 



43904.0(100.0) 
51744.0(100.0) 
6776.0(100.0) 
224.0(100.0) 
44.0(100.0) 
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Table 131: Sanpling of a Library- 
Encoded by (NNT) 4 (NNG) 2 
(continued) 



10 



15 



20 



25 



Iiibr axv size 


m 


5.0000E+07 


total 


8.39565+06 


fraction 




4491779 


.0( 96 


.6) 


xxOQxx 




153663 


.8(100 


.0) 


xxeexs 




482943 


.4(100 


.0) 


xxODxS 




142295 


.8(100 


.0) 


XX6DSS 


xxQQSS 


4704 


.0(100 


.0) 


xseess, 


XS0QSS 


2464 


.0(100 


.0) 


XSDQSS , 


sseess 


121 


.0(100 


.0) 


SS6DSS 


SSQQSS 


4 


.0(100. 


.o> 




Library size 


1.0000E+08 


total 


8.5503E+06 


fraction 




.0( 99. 


9) 


XXBDxx. 




153664. 


.0(100. 


0) 


xxeexs. 




482944. 


.0(100. 


0) 


xxQQxS. 




142296. 


.0(100. 


0) 


xxBQSS. 




4704. 


.0(100. 


0) 


xseess. 


xSBQSS 


2464. 


0(100. 


0) 


XSQQSS. 


sseess 


121. 


0(100. 


0) 


SSQQSS . 




4. 


0(100. 


0) 





9.8130E-01 



6776.0(100.0) 
224.0(100.0) 
44.0(100.0) 



43904.0(100.0) 
51744.0(100.0) 
6776.0(100.0) 
224.0(100.0) 
44.0(100.0) 
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Table 132: Relative efficiencies of 
various simple variegation codons 

Number of codons 

5 S 2 

#dna/#aa #dna/#aa #dna/#aa 

[#DNA] [#DNA] [#DNA] 

vgcodop mm tMai utm 

8.95 13.86 21.49 

[2.86 -10 7 ] [8.87-10»] [2.75-10 10 ] 
(3.2-10 6 ) (6.4-10 7 ) (1. 28-10') 

1.38 1-47 1.57 

[1.05-106] [1.68-10 7 ] [2.68-10»] 
(7.59-10 5 ) (1.14-10 7 ) (1.71- 10 s ) 



10 NNK 

assuming 
stops vanish 

NNT 

15 



20 



NNG 

assuming 
stops vanish 



2.04 
[7.59 -10 5 ] 
(3.7-10 5 ) 



2.36 2.72 
[1. 14-10*] [1.71-10*] 
(4.83-10 6 ) (6.27-10 7 ) 
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. Table 155 

Distance in A between alpha carbons in octapeptides : 

5 Extended Strand: angle of C a l-CJi-CJi « 138° 



10 



15 





1 


2 


3 


4 


5 


1 












2 




3.8 








3 




7.1 


3.8 






4 


10.7 


7.1 


3.8 






5 


14.2 


10.7 


7.1 


3.8 




6 


17.7 


14.1 


10.7 


7.1 


3.8 


7 


21.2 


17.7 


14.1 


10.6 


7.0 


8 


24.6 


20.9 


17.5 


13.9 


10.6 



3.8 
7.0 



3.8 



20 



25 



30 



Reverse turn between residues 4 and 5. 





1 


2 


3 


4 


5 


1 












2 




3.8 








3 




7.1 


3.8 






4 


10.6 


7.0 


3.8 






5 


11.6 


8.0 


6.1 


3.8 




6 


9.0 


5.8 


5.5 


5.6 


3.8 


7 


6.2 


4.1 


6.3 


8.0 


7.0 


8 


5.8 


6.0 


9.1 


11.6 


10.7 



Alpha helix: angle of C„l-C a 2-C < ^ 



3.8 
7.2 



93 < 



3.8 



35 



40 





1 


2 


3 


4 


5 


6 


1 














2 




3.8 










3 




5.5 


3.8 








4 


5.1 


5.4 


3.8 








5 


6.6 


5.3 


5.5 


3.8 






6 


9.3 


7.0 


5.6 


5.5 


3.8 




7 


10.4 


9.3 


6.9 


5.4 


5.5 


3.8 


8 


11.3 


10.7 


9.5 


6.8 


5.6 


5.6 



3.8 
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Table 156 

Distances between alpha carbons in closed mini -proteins of 
5 the form disulfide cyclo (CXXXXC) 

Minimum distance 
10 i 2 3 4 5 6 



1 












2 


3.8 










3 


5.9 


3.8 








4 


5.6 


6.0 


3.8 






5 


4.7 


5.9 


6.0 


3.8 




6 


4.8 


5.3 


5.1 


5.2 


3.8 



20 Average distance 

1 2 2 £ 5 £ 

l 

2 3.8 
25 3 6.3 

4 7.5 

5 7.1 

6 5.6 



Maximum distance 

1 2 3 4 5 6 



35 


1 
2 
3 


3.8 
6.7 


3.8 










4 


9.0 


6.9 


3.8 








5 


8.7 


8.8 


6.8 


3.8 




40 


6 


6.6 


9.2 


9.1 


6.8 


3.8 



3.8 

6.4 3.8 

7.5 6.3 3.8 

7.5 7.7 6.4 3.8 
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Table 820: Peptide Phage 

Antibiotic 

2^5 ^ lve stre Ptavidin Resistance 
5 Bxpdmcr Peptide s*g. Marker 

AE6PCHPQP - - CQSYIEGRIV B... 

DEV(F) AE-PCHPQYRLCQRPLKQPPPPPPAE... 
Dev(E) AE-LCHPQPPRCNLFRKVPPPPPPAE... 
10 EBQ6 AE6PCHP0PPRCYIEGRIV -E... 

11111111112222222 
123 45678901234567890123456 



15 
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Table 838: Streptavidin-binding 
disulfide-constrained peptides 



10 



#2 giu gly tyr 

#4 giu giy bis 

#5 giu gly leu 

#8 giu gly as P 

#i giu gly asn 

#3 giu giy as P 

#13 giu gly asp 

Table 



15 



cys his pro gin phe cys pro ser 
eys his pro gin phe cys ser ser 
cys his pro gin phe cys gly ser 
cys his pro gin phe cys ser ser 
cys his pro gin phe cys pro ser 
cys his pro gin phe cys arg ser 
cys his pro gin phe cys val ser 
cys his pro gin phe cys 

839: Sequences Obtained by 
Enrichment over BSA 



4 

3 

2 

2 
1 
1 
1 



consensus 



#21 giu giy gly P« e 

#22 giu giy His cys asp 

20 #23 giu gly P*"* cys his 

#24 giu giy h is ^ t Y r 

#25 giu gly his cys aBp 

#26 giu gly ile cys tyr 

..#27 giu giy gly cy 8 P he 

25 #28 giu gly ser cys asp 
No consensus observed. 



lys arg asn 
lys lyB ile 
thr ala ala 
lys gly val 
lys trp arg 
arg leu asp 
pro trp his 
ser leu arg 



cys tyr ser 
cys leu ser 
cys phe ser 
cys ser ser 
cys pro ser 
cys ile ser 
cys phe ser 
cys asp ser 



1 

1 

1 

1 

1 

1 

1 

1 
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CLAIMS 

1. In a process for developing novel binding proteins 
with a desired binding activity against a particular target 
material comprising providing a population of genetic packages, 
5 each displaying one or more copies of a particular potential 
binding domain as part of a chimeric outer surface protein 
thereof, said potential binding domain not being natively 
associated with the outer surface of said package, said 
population collectively displaying a plurality of different 

10 potential binding domains, the differentiation among said 
plurality of different potential binding domains occurring 
through the at least partially random variation of one or more 
predetermined amino acid positions, but not all amino acid 
positions, of said parental binding domain to randomly obtain 

15 at each said variable position an amino acid belonging to a 
predetermined set of two or more amino acids, the amino acids 
of said set occurring at Baid position in predetermined 
expected proportions; contacting the packages with the target 
material; and separating the packages according to their 

20 affinity for said target material; 

the improvement comprising essentially each said 
potential binding domain being a mini -protein sequence of less 
than forty amino acids and having at least one intrachain 

25 covalent crosslink between at least a first amino acid position 
and a second a m i n o acid position thereof, the amino acids at 
said first and second positions being invariant in all of the 
chimeric proteins displayed by said population, with those 
residues which participate in the formation of a covalent 

30 crosslink being invariant throughput said population, with the 
proviso that when the crosslink is in the form of a disulfide 
bond, the potential binding domain is a micro-protein sequence 
of less than: forty amino acids. 
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a^no acid positions axe cysteines ' ^ 

not « t*an nine aT^t ^ " ^ * *~ * 

5. The method of >. . 

secondary structure is sele^ rrcTJT 

<*> - « ^i*. a turn, and * t strL T ZT* "* 
IB and an a ieta,. ^ (0 , //j*™*' ,b) 311 « teli *< a turn, 

ic) a 0 strand, a ton, and a « 

^ preferably includes two clustered cysteins. 
domain has ^eel^d T", 8 Wh6rein 

of 1-4, a-s^n.^^ ^ " COnnec "vi ty patter. 

domain substantially corresnonT* ^cro-protein 
conotoxin. Responds an sequence to a xnu- or 

12. The method of claim 6 wherein i-ho ™4 
domain substantially corresoond,, < ^cro-protein 
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•n I (ST ) . the bee venom apamin. or a 
t^in Witlr *e scorpion tcin. — — - 

™, "^Iti at™, such as sine. iron, copper or 
5 crosslink includes a metal atom, 

cobalt • o£ any of claims 1-13 "herein at least 

14 . The method o£ any o tential binding 

on. variable amino acid 

^ine was encoded by «, bks, and SKT. 

10 the group consisti^ ^ ^ ^ _ o£ 

15 . The method o£ any ^ or pote ntial binding 
- variable amino acid ^ ^ ^ .elected £ rom 
aomain "as encoded by a ™* Y m . 

■*»» consist^ of « - ^ ^ whsrein leMt 



replicable genetic package is tilamentous phage. 

20 other than phage ^^^Ttherein the potential 

18 ' LTZ maior coat protein of a 

binding domain is fused « fragment thereof, or with 

Uiamentous phage or 'V^^L P^e or an assemblage 
the gene III protein of a riJ.a™= 
25 fragment thereof. o£ claims 1-16 "herein the 

x9 . ae method of any 1 ceU _ MC h as strains 

replicable genetic • e*^*^ ^binurim. 

30 sasillus suiiilla. saw .eouence, and the potential 

pe riplasmic secretion signal outer surface protein 

binding domain w fus«a » Pn ospholipaSe A, or 

fl uch as the lamB protein. Qmj*. 0*fi.J*Z 
piiin. or an assemblage segment thereof. 
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20. The method of anv nf „n j 
Potion is c^cteri/ed ^ l C i B T la 1 - I 7 heMln Baid 
different potenti.1 „• „. """Play of at least 10' 

potentially^ SLTSS' T W " ereln ' ^ - 
= that it will be ^ZZT- T 9 , protabili ^ 
Population is at lea^L " "* Pa '* a9e ta Sal<S 

21 i n„ ' 8 at least 90%. 

^playiu^ or^e cop^ *~ r ^ "* 
*»in as part of a c^Tolr 
0 said potential bindinT^nl™ 3 " f"* - " *' 
«U» the outer surfa J oTe^njL * ^ —«*■*«' 
collectively displavio, , ' * CeUs ' Baid « 1 T 

* P ylDS a ««"ity =f different notenM*, 

< -est PartLlTrandoT^^ T ^ * 
amino acid positions, but not a!l L ' P«deterMined 

~ of ^rzirrcir tr 9 " 3 to * — «— ~ 

occurring at said positionT^ ^ ° f Mt 

proportions. Predetermined expected 

essentially each said potential w,,,, , , 

protein seouence of less t^ si^^M * ^ 

1-t one intrachain covalent T^V " 

first amino acid nositior, ^ between at least a 

thereof, the -J^"^™ T 

being invariant in all of «. * *** second positions 

-aid populatS ZZL \h Pr ° teins dls * 1 ^ »» 

forJioTo T"^ *** in the 

•aid population ^ 

sulfide ^ZZZ^J?';*** i= a 

of less than 40 residuef ^ iS * "^-protein 
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